{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Product Categorization\n",
    "\n",
    "### Multi-Class Text Classification of products based on their description\n",
    "\n",
    "\n",
    "The aim of the project is multi-class text classification of make-up products based on their description and categories. Based on given text as an input, we have predicted what would be the category. We have five types of categories corresponding to different makeup products. We used different machine learning algorithms to get more accurate predictions and choose the most accurate one for our issue. The following classification algorithms have been used: Logistic Regression, Multinomial Naive Bayes, Linear Support Vector Machine (LinearSVM), Random Forest and Gradient Boosting as well. To analysis we used python and their libraries: pandas, matplotlib, NLTK and scikit-learn.\n",
    "\n",
    "**Dataset**\n",
    "\n",
    "The dataset comes from http://makeup-api.herokuapp.com/ and has been obtained by an API.\n",
    "\n",
    "The dataset contains the real descriptions about makeup products. Each description has been labeled with a specific product, therefore, this is a supervised text classification problem. \n",
    "\n",
    "Attributes:\n",
    "\n",
    "- product_type - category of makeup product.\n",
    "- description - description of makeup product."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Importing packages and loading data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import re\n",
    "import nltk\n",
    "from nltk.corpus import stopwords\n",
    "from nltk.stem import PorterStemmer\n",
    "from nltk.tokenize import sent_tokenize, word_tokenize\n",
    "\n",
    "from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.naive_bayes import MultinomialNB\n",
    "from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\n",
    "from sklearn.svm import LinearSVC\n",
    "\n",
    "from sklearn.metrics import accuracy_score, classification_report\n",
    "\n",
    "import pickle\n",
    "from joblib import dump, load"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For our analysis we use data with only two variables: product_type and description."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_type</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>lip_liner</td>\n",
       "      <td>Lippie Pencil A long-wearing and high-intensit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>Blotted Lip Sheer matte lipstick that creates ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>Lippie Stix Formula contains Vitamin E, Mango,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>foundation</td>\n",
       "      <td>Developed for the Selfie Age, our buildable fu...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>All of our products are free from lead and hea...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  product_type                                        description\n",
       "0    lip_liner  Lippie Pencil A long-wearing and high-intensit...\n",
       "1     lipstick  Blotted Lip Sheer matte lipstick that creates ...\n",
       "2     lipstick  Lippie Stix Formula contains Vitamin E, Mango,...\n",
       "3   foundation  Developed for the Selfie Age, our buildable fu...\n",
       "4     lipstick  All of our products are free from lead and hea..."
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data\\products_description.csv', header=0,index_col=0)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Firts observations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 906 entries, 0 to 930\n",
      "Data columns (total 2 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   product_type  906 non-null    object\n",
      " 1   description   906 non-null    object\n",
      "dtypes: object(2)\n",
      "memory usage: 21.2+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Shape of data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(906, 2)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Checking the missing values in data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "product_type    0\n",
       "description     0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example description:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Developed for the Selfie Age, our buildable full coverage, natural matte foundation delivers flawless looking skin from day-to-night. The oil-free, lightweight formula blends smoothly and is easily customizable to create the coverage you want. Build it up or sheer it out, it was developed with innovative soft-blurring pigments to deliver true color while looking and feeling natural. The lockable pump is easy to use and keeps your routine mess-free! As always, 100% cruelty-free and vegan.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['description'][3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Cleaning\n",
    "\n",
    "**Data type change:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['description'] = df['description'].astype(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "94257\n"
     ]
    }
   ],
   "source": [
    "print(df['description'].apply(lambda x: len(x.split(' '))).sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have 94 282 words in the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['lip_liner', 'lipstick', 'foundation', 'eyeliner', 'eyeshadow',\n",
       "       'blush', 'bronzer', 'mascara', 'eyebrow', 'nail_polish'],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_df.product_type.unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Grouping data to a smaller number of categories:\n",
    "\n",
    "We have 10 unique products and  some of them  we can group to one category. We can link 'eyeliner', 'eyeshadow', 'mascara', 'eyebrow' to one group called as 'eye_makeup'. We can apply the same to 'blush' and 'bronzer' as 'contour' and 'lipstick' with 'lip_liner' to 'lipstick'. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "foundation     159\n",
       "lipstick       148\n",
       "eyeliner       145\n",
       "mascara         91\n",
       "eyeshadow       86\n",
       "blush           75\n",
       "bronzer         69\n",
       "nail_polish     60\n",
       "eyebrow         45\n",
       "lip_liner       28\n",
       "Name: product_type, dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.product_type.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def grouping_data(df):\n",
    "    df.loc[df['product_type'].isin(['lipstick','lip_liner']),'product_type'] = 'lipstick'\n",
    "    df.loc[df['product_type'].isin(['blush','bronzer']),'product_type'] = 'contour'\n",
    "    df.loc[df['product_type'].isin(['eyeliner','eyeshadow','mascara','eyebrow']),'product_type'] = 'eye_makeup'\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = grouping_data(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have received a five group of categories:  Eye makeup, Lipstick, Foundation, Contour and Nail polish."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Lenght of characters:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_type</th>\n",
       "      <th>description</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>Lippie Pencil A long-wearing and high-intensit...</td>\n",
       "      <td>232</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>Blotted Lip Sheer matte lipstick that creates ...</td>\n",
       "      <td>146</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>Lippie Stix Formula contains Vitamin E, Mango,...</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>foundation</td>\n",
       "      <td>Developed for the Selfie Age, our buildable fu...</td>\n",
       "      <td>492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>lipstick</td>\n",
       "      <td>All of our products are free from lead and hea...</td>\n",
       "      <td>357</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  product_type                                        description  length\n",
       "0     lipstick  Lippie Pencil A long-wearing and high-intensit...     232\n",
       "1     lipstick  Blotted Lip Sheer matte lipstick that creates ...     146\n",
       "2     lipstick  Lippie Stix Formula contains Vitamin E, Mango,...     188\n",
       "3   foundation  Developed for the Selfie Age, our buildable fu...     492\n",
       "4     lipstick  All of our products are free from lead and hea...     357"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['length'] = df['description'].apply(len)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data analysis\n",
    "\n",
    "We check proportion product_type variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "eye_makeup     367\n",
       "lipstick       176\n",
       "foundation     159\n",
       "contour        144\n",
       "nail_polish     60\n",
       "Name: product_type, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.product_type.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEXCAYAAABCjVgAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3deZgcVb3/8fcHCItsERkUkkBQwhVQiTAkbCqbbC5BLiAoCFw0LiDCIz6i3iuL4gpENtEgkASRRUCIiPzEICAghIAxBBCNECQmJmELASSa8P39cU7XVDrdM51henoy83k9Tz9TfepU1beqeupbdar6tCICMzMzgNVaHYCZmfUdTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwVbKZImSPpmi5YtSZdLel7S1BYsfw9Jc3p7uVUx/FrS0a2MoUzS6ZJ+2uo4rOc4KaziJM2WNF/SuqWyT0q6o4VhNcvuwPuBoRExqtXBdJekOyR9sjvTRsQBETGxp2Myq3BS6B/WAL7Q6iBWlqTVV3KSLYDZEfFyDyxbkvz578MkrdHqGAYi/1P0D98HTpE0uHqEpOGSovwPVj5TlXSMpHskjZP0gqQnJO2ay5+WtKBGc8XGkm6TtFjSnZK2KM377Xncc5Iel3RYadwESRdLukXSy8CeNeLdTNLkPP0sSZ/K5ccBPwF2kfSSpDNqTFtZlwskLZL0Z0l7V633WZLuAV4B3lpvebn+Ojnm5yU9CuxUtbyQtFXV+n2z9H6MpOmSXpT0N0n7SzoLeA9wYV6PC3OCGpe39SJJMyS9o3r96uy7uyWdnWN8UtIBtaYrbdvrJS3MdU8sjRsl6Q/5MzAvx7Vmafx2pf06X9JXS7NeU9Kk/Hl4RFJ7JzHsmz8XiyT9MH9+an0WnwNOl7SapP+V9FTePpMkbZjrr9Ccp3TlvE8ePl3SdZKuybE9JGn7erFZ4qTQP0wD7gBO6eb0o4EZwJuAnwFXkw6AWwFHkg5g65Xqfxz4BrAxMB24EkCpCeu2PI9NgCOAH0rarjTtx4CzgPWBu2vEchUwB9gMOAT4lqS9I+JS4DPAHyJivYg4rZN1eSLHdhpwg6SNSuOPAsbm5T9Vb3m57mnA2/JrP6DhtnxJo4BJwJeAwcB7SVc5XwN+D5yQ1+MEYN88futc96PAsw0uajTweF7f7wGXSlKNeFYDfgn8CRgC7A2cJGm/XGUZcHKezy55/OfytOsDvwVuJW2nrYAppdl/mPSZGQxMBi6ss002Bq4DvkL6rD0O7FpjfZ4gfX7OAo7Jrz2BtwLr1Zt/HWOAnwMbkT6XN0oatBLTDzhOCv3H14HPS2rrxrRPRsTlEbEMuAYYBpwZEUsi4jfAv0kHgopfRcRdEbEE+Brp7H0Y8EHSge/yiFgaEQ8B15MOthU3RcQ9EfFaRLxaDiLPY3fgyxHxakRMJ10dHLUS67IA+EFE/CciriEdeD5QGj8hIh6JiKXAW7pY3mHAWRHxXEQ8DZy/EnEcB1wWEbfldf1HRPy5Tt3/kJLU2wFFxGMRMa/B5TwVEZfkfTcR2BR4c416OwFtEXFmRPw7Ip4ALgEOB4iIByPivrzfZgM/Bt6Xp/0g8M+IOCdvp8URcX9p3ndHxC05hiuAemfjBwKPRMQNefufD/yzqs7ciLggx/Ev0gnIuRHxRES8REooh6vxpqUHI+K6iPgPcC6wNrBzg9MOSE4K/UREzARuBk7txuTzS8P/yvOrLitfKTxdWu5LwHOkM8gtgNG5CeIFSS+Q/qnfUmvaGjYDnouIxaWyp0hnto36Ryzfy+NTeb61lt/V8jarqv/USsQxDPhbIxUj4nbS2e9FwHxJ4yVt0OByioNqRLySB9erUW8LYLOqffNVcgKRtLWkmyX9U9KLwLdIVw2NrEv5wP4KsHadg/Zy2zPvp+qnuao/H5ux/HZ/inQPrVbiq6W8vNfouCq0OpwU+pfTgE+x/EG0clP2DaWy8kG6O4ZVBnKz0kbAXNI/4J0RMbj0Wi8iPluatrNueecCG+XmiorNgX+sRGxDqppPNs/zrbX8rpY3j9K65nFlr1B/uz5NanaqZYVtEBHnR8SOwHakZqQv1Zm2u54mXRGW9836EXFgHn8x8GdgRERsQEoYKk1bb11WxjxgaOVN3k9Dq+pUb5u5pIRWsTmwlHQi8zKl7a/04EL1lXL5s7paXt5crC4nhX4kImaRmn9OLJUtJB3kjpS0uqT/4fX/gx8oafd8I/IbwP25eeVmYGtJR0kalF87SdqmwfifBu4Fvi1pbUnvIjXDXLkSsW0CnJiXfSiwDXBLN5d3LfAVSW+UNBT4fNUspgMfy9t1fzqaWwAuBY6VtHe+WTpE0tvzuPmk9nEA8jYandu6XwZeJbXx96SpwIuSvqx0A311Se+QVLl5vj7wIvBSjrOcyG8G3iLpJElrSVpf0uhuxPAr4J2SDspXEsfT9QnKVcDJkrbMJyDfAq7JzU9/IV2VfCBvu/8F1qqafkdJB+flnQQsAe7rRuwDhpNC/3MmsG5V2adIZ57Pks5E732dy/gZ6arkOWBHUhMRuRlmX1I79VxSs8J3WfEftTNHAMPz9L8ATouI21Zi+vuBEcAzpBuVh0REZzdtO1veGaTmiieB35Day8u+AHwIqDST3VgZERFTgWOBccAi4E46znjPAw5RemLofGADUvv+83l5zwJnr8Q6dym3938IGJnX5xnS/ZMNc5VTSA8BLM6xXFOadjHp+yEfIu3Tv1LjybEGYngGOJR0Q/xZYFvSQxJLOpnsMtJ2vyvH/So5OUfEItLN8J+QTnxeZsXmqJtIN+6fJ90rOjjfX7A65B/Zsf5C0jHAJyNi91bHYl3LzTlzgI9HxO+aMP/Tga0i4siennd/5isFM+s1kvaTNFjSWnTct3BzTh/ipGBmvWkX0pNMz5Caow7Kj55aH+HmIzMzK/hKwczMCqt0h1Mbb7xxDB8+vNVhmJmtUh588MFnIqJm7werdFIYPnw406ZNa3UYZmarFEl1v53v5iMzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrrNLfaDZbWbtdsFurQ+hx93z+nlaHYP2IrxTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTUsKktaWNFXSnyQ9IumMXD5B0pOSpufXyFwuSedLmiVphqQdmhWbmZnV1sxHUpcAe0XES5IGAXdL+nUe96WIuK6q/gHAiPwaDVyc/5qZWS9p2pVCJC/lt4PyKzqZZAwwKU93HzBY0qbNis/MzFbU1HsKklaXNB1YANwWEffnUWflJqJxktbKZUOAp0uTz8ll1fMcK2mapGkLFy5sZvhmZgNOU5NCRCyLiJHAUGCUpHcAXwHeDuwEbAR8OVdXrVnUmOf4iGiPiPa2tpq/O21mZt3UK08fRcQLwB3A/hExLzcRLQEuB0blanOAYaXJhgJzeyM+MzNLmvn0UZukwXl4HWAf4M+V+wSSBBwEzMyTTAY+kZ9C2hlYFBHzmhWfmZmtqJlPH20KTJS0Oin5XBsRN0u6XVIbqbloOvCZXP8W4EBgFvAKcGwTYzMzsxqalhQiYgbw7hrle9WpH8DxzYrHzMy65m80m5lZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVmhaUpC0tqSpkv4k6RFJZ+TyLSXdL+mvkq6RtGYuXyu/n5XHD29WbGZmVlszrxSWAHtFxPbASGB/STsD3wXGRcQI4HnguFz/OOD5iNgKGJfrmZlZL2paUojkpfx2UH4FsBdwXS6fCByUh8fk9+Txe0tSs+IzM7MVNfWegqTVJU0HFgC3AX8DXoiIpbnKHGBIHh4CPA2Qxy8C3tTM+MzMbHlNTQoRsSwiRgJDgVHANrWq5b+1rgqiukDSWEnTJE1buHBhzwVrZma98/RRRLwA3AHsDAyWtEYeNRSYm4fnAMMA8vgNgedqzGt8RLRHRHtbW1uzQzczG1Ca+fRRm6TBeXgdYB/gMeB3wCG52tHATXl4cn5PHn97RKxwpWBmZs2zRtdVum1TYKKk1UnJ59qIuFnSo8DVkr4J/BG4NNe/FLhC0izSFcLhTYzNzMxqaFpSiIgZwLtrlD9Bur9QXf4qcGiz4jEzs675G81mZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzQtKQgaZik30l6TNIjkr6Qy0+X9A9J0/PrwNI0X5E0S9LjkvZrVmxmZlbbGk2c91LgixHxkKT1gQcl3ZbHjYuIs8uVJW0LHA5sB2wG/FbS1hGxrIkxmplZSdOuFCJiXkQ8lIcXA48BQzqZZAxwdUQsiYgngVnAqGbFZ2ZmK+qVewqShgPvBu7PRSdImiHpMklvzGVDgKdLk82hRhKRNFbSNEnTFi5c2MSozcwGnqYnBUnrAdcDJ0XEi8DFwNuAkcA84JxK1RqTxwoFEeMjoj0i2tva2poUtZnZwNTUpCBpECkhXBkRNwBExPyIWBYRrwGX0NFENAcYVpp8KDC3mfGZmdnymvn0kYBLgcci4txS+aalah8BZubhycDhktaStCUwApjarPjMzGxFzXz6aDfgKOBhSdNz2VeBIySNJDUNzQY+DRARj0i6FniU9OTS8X7yyMysdzUtKUTE3dS+T3BLJ9OcBZzVrJjMzKxz/kazmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWaCgpSJrSSJmZma3aOv2RHUlrA28ANpb0Rjp+NGcDYLMmx2ZmZr2sq19e+zRwEikBPEhHUngRuKiJcZmZWQt0mhQi4jzgPEmfj4gLeikmMzNrkYZ+ozkiLpC0KzC8PE1ETKo3jaRhwCTgLcBrwPiIOE/SRsA1eV6zgcMi4nlJAs4DDgReAY6JiIe6sU5mZtZNjd5ovgI4G9gd2Cm/2ruYbCnwxYjYBtgZOF7StsCpwJSIGAFMye8BDgBG5NdY4OKVWxUzM3u9GrpSICWAbSMiGp1xRMwD5uXhxZIeA4YAY4A9crWJwB3Al3P5pLyM+yQNlrRpno+ZmfWCRr+nMJPUDNQtkoYD7wbuB95cOdDnv5vkakOAp0uTzcll1fMaK2mapGkLFy7sbkhmZlZDo1cKGwOPSpoKLKkURsSHu5pQ0nrA9cBJEfFiunVQu2qNshWuTCJiPDAeoL29veErFzMz61qjSeH07sxc0iBSQrgyIm7IxfMrzUKSNgUW5PI5wLDS5EOBud1ZrpmZdU+jTx/dubIzzk8TXQo8FhHnlkZNBo4GvpP/3lQqP0HS1cBoYJHvJ5iZ9a6GkoKkxXQ05awJDAJejogNOplsN+Ao4GFJ03PZV0nJ4FpJxwF/Bw7N424hPY46i/RI6rErsR5mZtYDGr1SWL/8XtJBwKguprmb2vcJAPauUT+A4xuJx8zMmqNbvaRGxI3AXj0ci5mZtVijzUcHl96uRvregp/8MTPrZxp9+uhDpeGlpO4pxvR4NGZm1lKN3lPwTV8zswGg0b6Phkr6haQFkuZLul7S0GYHZ2ZmvavRG82Xk75HsBmp64lf5jIzM+tHGk0KbRFxeUQsza8JQFsT4zIzsxZoNCk8I+lISavn15HAs80MzMzMel+jTx/9D3AhMI70KOq99PFvHO/4pbq//7PKevD7n2h1CGbWzzWaFL4BHB0RzwPkX087m5QszMysn2i0+ehdlYQAEBHPkX4fwczM+pFGk8Jqkt5YeZOvFBq9yjAzs1VEowf2c4B7JV1HuqdwGHBW06IyM7OWaPQbzZMkTSN1gifg4Ih4tKmRmZlZr2u4CSgnAScCM7N+rFtdZ5uZWf/kpGBmZgU/QTQA/P3Md7Y6hB63+dcfbnUIZv2SrxTMzKzQtKQg6bLc1fbMUtnpkv4haXp+HVga9xVJsyQ9Lmm/ZsVlZmb1NfNKYQKwf43ycRExMr9uAZC0LXA4sF2e5oeSVm9ibGZmVkPTkkJE3AU812D1McDVEbEkIp4EZgGjmhWbmZnV1op7CidImpGblypdZwwBni7VmZPLViBprKRpkqYtXLiw2bGamQ0ovZ0ULgbeBowE5pG6z4D0LelqUWsGETE+Itojor2tzb/zY2bWk3o1KUTE/IhYFhGvAZfQ0UQ0BxhWqjoUmNubsZmZWS8nBUmblt5+BKg8mTQZOFzSWpK2BEYAU3szNjMza+KX1yRdBewBbCxpDnAasIekkaSmodnApwEi4hFJ15L6VloKHB8Ry5oVm5nBne99X6tD6HHvu+vOVoewymtaUoiII2oUX9pJ/bNwd9xmZi3lbzSbmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWaFpSkHSZpAWSZpbKNpJ0m6S/5r9vzOWSdL6kWZJmSNqhWXGZmVl9zbxSmADsX1V2KjAlIkYAU/J7gAOAEfk1Fri4iXGZmVkdTUsKEXEX8FxV8RhgYh6eCBxUKp8UyX3AYEmbNis2MzOrrbfvKbw5IuYB5L+b5PIhwNOlenNy2QokjZU0TdK0hQsXNjVYM7OBpq/caFaNsqhVMSLGR0R7RLS3tbU1OSwzs4Glt5PC/EqzUP67IJfPAYaV6g0F5vZybGZmA15vJ4XJwNF5+GjgplL5J/JTSDsDiyrNTGZm1nvWaNaMJV0F7AFsLGkOcBrwHeBaSccBfwcOzdVvAQ4EZgGvAMc2Ky4zM6uvaUkhIo6oM2rvGnUDOL5ZsZiZWWP6yo1mMzPrA5wUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCk4KZmZWcFIwM7OCk4KZmRWcFMzMrOCkYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVmvbLa2Zmq4oLv/jLVofQ404450Pdms5XCmZmVmjJlYKk2cBiYBmwNCLaJW0EXAMMB2YDh0XE862Iz8xsoGrllcKeETEyItrz+1OBKRExApiS35uZWS/qS81HY4CJeXgicFALYzEzG5BalRQC+I2kByWNzWVvjoh5APnvJi2KzcxswGrV00e7RcRcSZsAt0n6c6MT5iQyFmDzzTdvVnxmZgNSS64UImJu/rsA+AUwCpgvaVOA/HdBnWnHR0R7RLS3tbX1VshmZgNCrycFSetKWr8yDOwLzAQmA0fnakcDN/V2bGZmA10rmo/eDPxCUmX5P4uIWyU9AFwr6Tjg78ChLYjNzGxA6/WkEBFPANvXKH8W2Lu34zEzsw596ZFUMzNrMScFMzMrOCmYmVnBScHMzApOCmZmVnBSMDOzgpOCmZkVnBTMzKzgpGBmZgUnBTMzKzgpmJlZwUnBzMwKTgpmZlZwUjAzs4KTgpmZFZwUzMys4KRgZmYFJwUzMys4KZiZWcFJwczMCn0uKUjaX9LjkmZJOrXV8ZiZDSR9KilIWh24CDgA2BY4QtK2rY3KzGzg6FNJARgFzIqIJyLi38DVwJgWx2RmNmAoIlodQ0HSIcD+EfHJ/P4oYHREnFCqMxYYm9/+F/B4rwe6oo2BZ1odRB/hbdHB26KDt0WHvrAttoiItloj1ujtSLqgGmXLZa2IGA+M751wGiNpWkS0tzqOvsDbooO3RQdviw59fVv0teajOcCw0vuhwNwWxWJmNuD0taTwADBC0paS1gQOBya3OCYzswGjTzUfRcRSSScA/w9YHbgsIh5pcViN6FPNWS3mbdHB26KDt0WHPr0t+tSNZjMza62+1nxkZmYt5KRgZmaFAZ8UJL2U/24m6bpuzuMkSW8ovb9F0uBO6k/I38noNZJOlPSYpCubNP/hkmY2UOdjpfftks5vRjzWN1Tvc+v7BnxSqIiIuRHR3QP1SUCRFCLiwIh4oWci6zGfAw6MiI+3MIbhQHGAiIhpEXFi68JpDUl7SLq51XH0kuGU9vnrlbvC6RMkfUbSJ/Jwj57oSTpd0il5+ExJ+3RSt0eX7aSQlc90JR0j6SZJt+bO+U7L5etK+pWkP0maKemjkk4ENgN+J+l3ud5sSRvn4U9ImpGnuaLGcr+Rd2rT9oWkHwFvBSZL+qKkG3NM90l6V65TfAjz+5l5mwzPVxiXSHpE0m8krZPr7JjX6w/A8aVph0v6vaSH8mvXPOo7wHskTZd0cvngKGmjTuK6TNIdkp7I29t6SfXnV9IWkqbksimSNs/1Jkg6X9K9eT9VDlLV+3xtSZdLeljSHyXtmac/RtKFpeXeLGmPPPxSPjDeD+zSu1ugvoj4UURM6oXlfD0iftvs5VQ4KdQ3Cvg4MBI4VFI7sD8wNyK2j4h3ALdGxPmkL9jtGRF7lmcgaTvga8BeEbE98IWq8d8DNgGOjYjXmrUiEfGZSoykM7c/RsS7gK8CjXyoRwAXRcR2wAvAf+fyy4ETI6L6H3UB8P6I2AH4KFBpIjoV+H1EjIyIcVXTnNFJXG8H9iPtk9MkDWogZgAkHSlpaj4o/VjScZLGlcZ/StK5derWPSvNB6rvSnpQ0m8ljSolrg/nOvWSY3k+O+WD41vzScdlkh7IZWNyna4OmOfk+U+RVLPrgu6o8/m9EJiU99OVdOxbgE2B3YEPkpIBrLjPjweIiHcCRwATJa3dRSjrAjMjYnRE3N0za7eieidA+TPyQE6M1ys3FVefSHUx79n58zI1v7bK5TWTbNW0xZWApO9IejTXP7tU7b01EnK3OCnUd1tEPBsR/wJuIH3YHwb2yTv3PRGxqIt57AVcFxHPAETEc6Vx/wcMjohPR+8+F7w7cEWO53bgTZI27GKaJyNieh5+EBiepxkcEXfm8vJV0CDgEkkPAz8n9Xj7euL6VUQsydtxAfDmBuaHpG1ISWm3iBgJLAOWAh8uJZZjgcvr1O2sqW1d4I6I2BFYDHwTeD/wEeDMXKdecqzEtyvwI2BMRDxBOgDfHhE7kRL49yWt28Vqrgs8lJdxJ3BaF/VXRq3P7y7Az/L4K0j7reLGiHgtIh6l/j4q7+c/A08BW3cRxzLg+m6twcqrdQJ0Q0TslBPjY8Bx3Zz3ixExipRYf5DLOkuyy5G0EenztV2u/83S6FoJuVv61JfX+pjqA3VExF8k7QgcCHxb0m8i4swa01aoxnwqHgB2lLRRVbJotnr9Sy1l+ZOE8tnbktLwMmAdOl+3k4H5wPZ5nq++jrhqLb/Rz+3ewI7AA5Igxb0AuB34oKTHgEER8bDSlyZr1a3n38CtefhhYElE/CcnwuG5fBBwoaRKkikf/LYhfYlp34iodOWyLylhVc4+1wZWOHOs8hpwTR7+KekEpqd0to8ryuPL+6nW/uysvLPP36sRsayLOHrKCidAwDskfRMYDKxH+nJtd1xV+lu5Wt0FODgPXwF8r5PpXyT9L/1E0q+A8n2pG3Nrw6OSGjppqsdXCvW9X6mdex3gIOAeSZsBr0TET4GzgR1y3cXA+jXmMQU4TNKboMj0FbeSMvqvJNWatlnuIp8B5yaIZyLiRWA2eX0k7QBs2dlM8o30RZIqZ4rls+oNgXn5Q3oU6dvpUH87dRbX6yFgYm66GBkR/xURpwM/AY4hXyV0Ubee/5Su8F4jHxDzOleSVjk5tgNrlqafR/oHf3dVvP9dimHziHiMzg+Y1XryqrPW5/deUvczkPZXV8051fu8vJ+3JiW9x0mfv5GSVpM0jNRU2Aq1TkAmACfkJq8z6Hz7dybqDNers/yIiKWk7XI96Zh0a2l0Iwm5IU4K9d1NytzTgesjYhrwTmCqpOmkS/3K5dt44NfKN5orchcdZwF3SvoTcG7V+J8Dl5BuAK/TzJUpOR1olzSDlJSOzuXXAxvldfss8JcG5nUscJHSjeZ/lcp/CBwt6T7S2fHLuXwGsDS3zZ7cYFyvxxTgEEmbQHEze4uIuJ/U8eLH6Dh7q1n3dS6/XnKE1DTxAeBblfsDpDPQzytfqkiqJIzZ1D9grgZU2pA/RtcH6YbV+fyeCByb99NRVN0nq6F6n/8QWD1fUV0DHBMRS4B7gCdJV11nAw/11Hr0gPWBebnJ8fU8vffR0t8/5OGGk6yk9YANI+IW0hOPI19HLPVFhF9VL9JZ5IWtjsOvHtmXHyUl9hmk5oCdc/mpwNWN1K0z35dKw6cDp1SPI7VPzwDuA75dKt8DuDkPbw48AowmNVn9mHRgnFmqI1J78yOkA+kdwB6VZQHfyPHeDrS1epuvqi9SU9HM0vtT8r79LClh3QFcAEyo3u+kq4lDOpn3bNL9nvtJTcdblZZ5e/6cTAE2rzdv0n2Dqbnuw8DRtZZd/mx25+W+j2qQdAzQHqUf97H+RelR2HERMaXVsbwekl6KiPVaHYd1TtJs0jGl1T+u0yU3H9UQEROcEPonSYMl/QX416qeEMyawVcKZp1Q+sLUWlXFR0XEw62Ix/o2Sb9gxYc0vhwR3X1iqdc5KZiZWcHNR2ZmVnBSMDOzgpOCrbIkvUmpn6Lpkv4p6R+l92t2PYcei2Pb/Bz+HyUNb+Jy1pDUrd53Je0gaf+ejsn6H3dzYausiHiW/AUeSaeTns8+u9OJmuNgUh9B3+jOxJLWiPRt1WbaAXgHy38L1mwFvlKwfkfStyWVu/L+rqTPSdpH0u+Uuuh+VNJFpW8PHyDpD0q9jV5TqyO6fLZ9f+6h8npJGyr1iHoC8BlJv62qv4akFySNy/O9rdRlxN2SzpJ0F3CCpC1zbDNyvaG53tvyMh8gfaGpMu99JN1Yev8jSUfm4dF5Xf6Up10X+Drw8XwV1as/8GSrFicF648qfRtVfpTlUDq6sxhN6iLgnaRO6cbkri1OBfaO1NvoDPGTWGsAAAI0SURBVGp33/BT4IuReqh8HPi/iJicl/f9iKj1QygbAvfl+f6B1DtuxQYR8d6I+AGp+4ef5Hn/nI5eNC8AzovUc+rCrlZcqRvqq4HjI/XquS+pj6UzgSsj9anUrV8YtIHBScH6nYj4G7BY0juBA4CpEfF8Hn1fRMyO1Ovm1aTuhnclde99b+776eN09HQKpPsXwNrR0Z//ROC9DYSzlHSQh5RUyl1NX10aHl16Pwl4Tx7ehY5eUFf4kaYatgH+HhEPAUTEoui9HkatH/A9BeuvLiVdLQwn9SdUsUKX6KS+hW6NiKM6mV93e56stbyKl+la1JgH1O85tZHurs3q8pWC9VfXAx8i3Ygut/XvLGnz3Kx0GKlXynuB90l6KxQ/uzqiPLPcZ82/1PHraUeRftSmK4Po6C+/s15M78vxABxJ6mK6urzcQ+dTwHaS1pT0RtIP4kDqNG8Lpe7PkbRBXtfOui03KzgpWL8UEa+SDqxXxfI/dXovcA6pl8m/AJMjYj7p17SuyV1E30vtXwM7ChiXu43eluV/+aqeRcAOkh4iNR3Vm+YEYGye90dJv8UAqavqkyVNJf3AS2X9ngRuzOsxidzVdKRuqI8ALs7r8htSNx23A9vnx2Z9o9nqcjcX1i9JWo3UDfZBkX7qEkn7kH4s5aBeimEN0o8FDe6N5Zn1BF8pWL+TbzD/jXSf4IlWx2O2KvGVgpmZFXylYGZmBScFMzMrOCmYmVnBScHMzApOCmZmVvj/89OdIHPx6qcAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x='product_type',data=df)\n",
    "plt.xlabel('Type of product')\n",
    "plt.title('Number of products in each group');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can see that products from the eye makeup group is the largest one. It is over two times larger than in the lipstick group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The length of characters in all categories:**\n",
    "\n",
    "Histogram of description length:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfQAAAE9CAYAAAD9MZD2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3de3wV1bn/8c9DiISIihrgBBETzs8Ll3CJAVQUFSyiUryArRaBYCt6EKl6PIKXKtqbtlgsR1vRioBFRbEqoj1KFQ7eCgICiqhwNFwOnEqhIFcL8vz+mEncQC57h+zsncn3/XrtV2bWzKx5Zml4smZmr2XujoiIiNRtDVIdgIiIiBw6JXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhomOoADkVOTo7n5eWlOgwREZFas2jRor+7e7MDy+t0Qs/Ly2PhwoWpDkNERKTWmNnq8sqTdsvdzCaZ2Zdm9lE5224xMzeznHDdzGyCma0ys2VmVpisuERERKIomc/QJwN9Dyw0s+OB7wBrYoovAE4MP8OB3ycxLhERkchJWkJ393nA5nI2jQduBWLHnL0YmOqBvwJNzSw3WbGJiIhETa0+Qzez/sD/uvtSM4vddBywNmZ9XVi2oZw6hhP04mndunXyghURqYf27NnDunXr2L17d6pDqfeysrJo1aoVmZmZce1fawndzLKBO4A+5W0up6zcWWPc/VHgUYCioiLNLCMiUoPWrVvHEUccQV5eHgd0vKQWuTubNm1i3bp15Ofnx3VMbX4P/V+BfGCpmZUArYDFZvYvBD3y42P2bQWsr8XYREQE2L17N8cee6ySeYqZGccee2xCd0pqLaG7+4fu3tzd89w9jyCJF7r7/wEzgSHh2+6nAVvd/aDb7SIiknxK5ukh0f8Oyfza2tPAe8DJZrbOzH5Yye6vAp8Dq4DHgBHJiktERCSKkvYM3d2vrGJ7XsyyA9cnKxYREamea1++tkbrm/jdiTVaX22aO3cu48aNY9asWakOpVway11ERCQClNBFRCTt/PGPf6Rbt2507tyZa6+9lscff5ybbrqpbPtjjz3GzTffXO6+33zzTYX1NmnShNGjR3Pqqady3nnnsWDBAs455xzatGnDzJkzASgpKeGss86isLCQwsJC3n333YPqef/99+nSpQuff/45O3bs4Oqrr6Zr16506dKFl156CYDJkyczcuTIsmP69evH3Llzy+L493//dwoLC+nduzcbN2485Dar02O517Rra/bOUlJMrLt3q0RE4rJixQqmT5/OO++8Q2ZmJiNGjKBhw4bMnDmTX/3qV2RmZvLEE08wceLEcvedNm0aQ4YMKbfuHTt2cM4553D//fdz6aWXcueddzJ79mw+/vhjhg4dSv/+/WnevDmzZ88mKyuLlStXcuWVV+43b8i7777LDTfcwEsvvUTr1q25/fbb6dWrF5MmTWLLli1069aN8847r9Jr3LFjB4WFhTzwwAPce++93HPPPTz00EOH1G5K6CIiklbeeOMNFi1aRNeuXQHYtWsXzZs3p1evXsyaNYu2bduyZ88eCgoKeOihh8rdtyKHHXYYffsGo5IXFBTQqFEjMjMzKSgooKSkBAgG1xk5ciRLliwhIyODzz77rOz4FStWMHz4cF5//XVatmwJwOuvv87MmTMZN24cEHz1b82aNVSmQYMGfP/73wfgqquu4rLLLqtGS+1PCV1ERNKKuzN06FB++ctf7lc+f/58fvGLX3DKKacwbNiwSvetSGZmZtnXwRo0aECjRo3Klvfu3QvA+PHjadGiBUuXLmXfvn1kZWWVHZ+bm8vu3bv54IMPyhK6u/P8889z8skn73euRYsWsW/fvrL1yr5TXhNfFdQzdBERSSu9e/dmxowZfPnllwBs3ryZ1atX0717d9auXctTTz3FlVdeWem+h2Lr1q3k5ubSoEEDnnzyyf2eyTdt2pRXXnmF22+/vex5+Pnnn89//ud/EnxhCz744AMgmOJ7yZIl7Nu3j7Vr17JgwYKyevbt28eMGTMAeOqppzjzzDMPKWZQD11ERCqRiq+ZtWvXjp/97Gf06dOHffv2kZmZycMPP8wJJ5zA9773PZYsWcLRRx9d5b7VNWLECAYMGMBzzz3Hueeey+GHH77f9hYtWvDyyy9zwQUXMGnSJH7yk59w44030rFjR9ydvLw8Zs2aRY8ePcjPz6egoIAOHTpQWPjtzOCHH344y5cv59RTT+Woo45i+vTp1Y63lJX+RVEXFRUVeeyLCodKL8WJSH23YsUK2rZtm+owKtSvXz9uuukmevfunepQDkmTJk3Yvn17lfuV99/DzBa5e9GB++qWu4iIpL0tW7Zw0kkn0bhx4zqfzJNFt9xFRCTtNW3adL+3zavSvXt3vv766/3KnnzySQoKCmo6tGqJp3eeKCV0ERGJnPnz56c6hFqnW+4iIiIRoIQuIiISAUroIiIiEaCELiIiaaVJkyYArF+/noEDB1arjgcffJCdO3eWrV944YVs2bKlwv2Li4vLBnqpq/RSnIiIVKimx+dIZCyNli1bVjvJPvjgg1x11VVkZ2cD8Oqrr1arnrpEPXQREUlLJSUldOjQAQimIr344ovp27cvJ598Mvfccw8QzFp20UUX0alTJzp06MD06dOZMGEC69ev59xzz+Xcc88FgmFY//73vwMwdepUOnbsSKdOnRg8ePBB5/3JT35CcXHxfuOw1wXqoYuISJ2wYMECPvroI7Kzs+natSsXXXQRq1evpmXLlrzyyitAMA77UUcdxW9+8xvmzJlDTk7OfnUsX76cn//857zzzjvk5OSwefPm/bbfeuutbN26lSeeeKJGJkypTeqhi4hInfCd73yHY489lsaNG3PZZZfx9ttvU1BQwF/+8hdGjx7NW2+9xVFHHVVpHW+++SYDBw4sS/THHHNM2baf/vSnbNmyhYkTJ9a5ZA5K6CIiUkccmGTNjJNOOolFixZRUFDAbbfdxr333ltpHe5eYbLu2rUrixYtOqjXXlcooYuISJ0we/ZsNm/ezK5du3jxxRfp0aMH69evJzs7m6uuuopbbrmFxYsXA3DEEUewbdu2g+ro3bs3zz77LJs2bQLYL3n37duXMWPGcNFFF5V7bLrTM3QREakTzjzzTAYPHsyqVav4wQ9+QFFREa+99hr/8R//QYMGDcjMzOT3v/89AMOHD+eCCy4gNzeXOXPmlNXRvn177rjjDs4++2wyMjLo0qULkydPLtt++eWXs23bNvr378+rr75K48aNa/syq03Tp8bQ9KkiUt+l6/SpkydPZuHChTz00EOpDqVWafpUERGReka33EVEJO0VFxdTXFyc6jDSmnroIiIiEaCELiIiEgFK6CIiIhGghC4iIhIBSUvoZjbJzL40s49iyn5tZp+Y2TIze8HMmsZsu83MVpnZp2Z2frLiEhGR9DZhwgTatm3LoEGDklJ/7KQvle3z1FNPla0vXLiQUaNGJSWempLMt9wnAw8BU2PKZgO3ufteM7sfuA0YbWbtgCuA9kBL4C9mdpK7f5PE+EREpCopmD/1d7/7HX/+85/Jz8+v2XMnoDSh/+AHPwCgqKiIoqKDvvqdVpLWQ3f3ecDmA8ped/e94epfgVbh8sXAM+7+tbt/AawCuiUrNhERSU/XXXcdn3/+Of379+eBBx7gkksuoWPHjpx22mksW7YMgLFjxzJu3LiyYzp06EBJSQklJSW0bduWa665hvbt29OnTx927doFwKJFi+jUqROnn346Dz/8cNmxJSUlnHXWWRQWFlJYWMi7774LwJgxY3jrrbfo3Lkz48ePZ+7cufTr1w8IhoutKK6rr76ac845hzZt2jBhwoRaabNSqXyGfjXw53D5OGBtzLZ1YdlBzGy4mS00s4UbN25McogiIlKbHnnkEVq2bMmcOXMoKSmhS5cuLFu2jF/84hcMGTKkyuNXrlzJ9ddfz/Lly2natCnPP/88AMOGDWPChAm89957++3fvHlzZs+ezeLFi5k+fXrZbfX77ruPs846iyVLlnDTTTftd8zdd99dYVyffPIJr732GgsWLOCee+5hz549h9okcUtJQjezO4C9wLTSonJ2K3dMWnd/1N2L3L2oWbNmyQpRRERS7O2332bw4MEA9OrVi02bNrF169ZKj8nPz6dz584AnHrqqZSUlLB161a2bNnC2WefDVBWJ8CePXu45pprKCgo4PLLL+fjjz8+pLguuugiGjVqRE5ODs2bN+dvf/tb4hdeTbU+UpyZDQX6Ab3924Hk1wHHx+zWClhf27GJiEj6KG+uETOjYcOG7Nu3r6xs9+7dZcuNGjUqW87IyGDXrl2VTpk6fvx4WrRowdKlS9m3bx9ZWVnVjqu88+/du/egfZOlVnvoZtYXGA30d/edMZtmAleYWSMzywdOBBbUZmwiIpJeevbsybRpwY3cuXPnkpOTw5FHHkleXl7ZNKmLFy/miy++qLSepk2bctRRR/H2228DlNUJsHXrVnJzc2nQoAFPPvkk33wTvItd0fSrlcWVaknroZvZ08A5QI6ZrQPuJnirvREwO/xr5q/ufp27LzezZ4GPCW7FX6833EVE6rexY8cybNgwOnbsSHZ2NlOmTAFgwIABTJ06lc6dO9O1a1dOOumkKut64oknuPrqq8nOzub887/9ZvSIESMYMGAAzz33HOeeey6HH344AB07dqRhw4Z06tSJ4uJiunTpUmVcqabpU2No+lQRqe/SdfrU+krTp4qIiNQzSugiIiIRoIQuIiISAUroIiIiEVDr30NPd4PmpdebcdN66i04ERGpmnroIiIiEaCELiIikXPg9Kf1gW65i4hIJWr6MWTtPEY8cPrTQ/XNN9+QkZFRI3Uli3roIiKSdqZOnUrHjh3p1KkTgwcPZvXq1fTu3ZuOHTvSu3dv1qxZA0BxcTGjRo3ijDPOoE2bNsyYMQM4ePrT3bt3M2zYMAoKCujSpQtz5swBYPLkyYwcObLsvP369WPu3LkANGnShLvuuovu3bsfNEtbOlIPXURE0sry5cv5+c9/zjvvvENOTg6bN29m6NChDBkyhKFDhzJp0iRGjRrFiy++CMCGDRt4++23+eSTT+jfvz8DBw7kvvvuY9y4ccyaNQuABx54AIAPP/yQTz75hD59+vDZZ59VGseOHTvo0KED9957b3IvuIaohy4iImnlzTffZODAgeTk5ABwzDHH8N5775XdPh88eHDZRCsAl1xyCQ0aNKBdu3YVTlcaO+XpKaecwgknnFBlQs/IyGDAgAE1cUm1QgldRETSSmXTnZaK3R47ZWlF85NUVF7ZVKxZWVlp/9w8lhL6IdiwfUNCHxERqVrv3r159tln2bRpEwCbN2/mjDPO4JlnngGC6U/PPPPMSus4cPrT2ClPP/vsM9asWcPJJ59MXl4eS5YsYd++faxdu5YFC+ruzN16hi4iImmlffv23HHHHZx99tlkZGTQpUsXJkyYwNVXX82vf/1rmjVrxhNPPFFpHQdOfzpixAiuu+46CgoKaNiwIZMnT6ZRo0b06NGD/Px8CgoK6NChA4WFhbV0lTVP06fGuPbaxEaKS7TXndskN9GQDhopTtOnikgyafrU9KLpU0VEROoZJXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhQQhcREYkADSwjIiIV+6ykZus7Ka9m6wMeeeQRsrOzGTJkCMXFxfTr14+BAwfWSN1jx46lSZMm3HLLLdx111307NmT8847r9x9a/rciVJCFxGROu26666rlfOk+6xruuUuIiJppaSkhLZt23LNNdfQvn17+vTpw65du3jsscfo2rUrnTp1YsCAAezcuRMIetHjxo2Lq+68vDxGjx5Nt27d6NatG6tWrQKocL71WMXFxfvNt96uXTs6duzILbfcUrbPvHnzDpqbvbYooYuISNpZuXIl119/PcuXL6dp06Y8//zzXHbZZbz//vssXbqUtm3b8vjjj1er7iOPPJIFCxYwcuRIbrzxRgBGjhzJkCFDWLZsGYMGDWLUqFEVHr9582ZeeOEFli9fzrJly7jzzjvLtpXOzT5r1izGjBlTrfiqSwldRETSTn5+Pp07dwbg1FNPpaSkhI8++oizzjqLgoICpk2bxvLly6tV95VXXln287333gOodL71Ax155JFkZWXxox/9iD/96U9kZ2eXbYtnbvZkUUIXEZG0EzvHeUZGBnv37qW4uJiHHnqIDz/8kLvvvnu/ucsTETuXekXzrlc2H3vDhg1ZsGABAwYM4MUXX6Rv377lxl3bk58poYuISJ2wbds2cnNz2bNnT9nc5tUxffr0sp+nn346QELzrW/fvp2tW7dy4YUX8uCDD7JkyZJqx1KTkvaWu5lNAvoBX7p7h7DsGGA6kAeUAN9z939Y8KfQb4ELgZ1AsbsvTlZsIiISpyR8zay6fvrTn9K9e3dOOOEECgoK2LZtW7Xq+frrr+nevTv79u3j6aefBkhovvVt27Zx8cUXs3v3btyd8ePHVyuOmpa0+dDNrCewHZgak9B/BWx29/vMbAxwtLuPNrMLgRsIEnp34Lfu3r2qc2g+dBGRmhX1+dDz8vJYuHAhOTk5qQ4lLmkxH7q7zwM2H1B8MTAlXJ4CXBJTPtUDfwWamlni2U9ERKSequ2BZVq4+wYAd99gZs3D8uOAtTH7rQvLDuoCm9lwYDhA69atkxutiIjUSZdeeilffPHFfmX3338/JSUlqQmoFqTLSHHlvU5Y7rMAd38UeBSCW+7JDEpEROqmF154IdUh1Lrafsv9b6W30sOfX4bl64DjY/ZrBayv5dhERITa/7qVlC/R/w61ndBnAkPD5aHASzHlQyxwGrC19Na8iIjUnqysLDZt2qSknmLuzqZNm8jKyor7mGR+be1p4Bwgx8zWAXcD9wHPmtkPgTXA5eHurxK84b6K4Gtrw5IVl4iIVKxVq1asW7eOjRs3pjqUei8rK4tWrVrFvX/SErq7X1nBpt7l7OvA9cmKRURE4pOZmUl+fn6qw5Bq0EhxIiIiEaCELiIiEgFK6CIiIhGghC4iIhIBSugiIiIRoIQuIiISAUroIiIiEaCELiIiEgFK6CIiIhGghC4iIhIBSugiIiIRoIQuIiISAUroIiIiEaCELiIiEgFK6CIiIhGghC4iIhIBSugiIiIRoIQuIiISAUroIiIiEaCELiIiEgFK6CIiIhGghC4iIhIBSugiIiIR0DDVAaSbE8fOi3vf4/bsTKju7MyViYbDoNxry5anTZuY8PEiIlI/xNVDN7MOyQ5EREREqi/eW+6PmNkCMxthZk2TGpGIiIgkLK6E7u5nAoOA44GFZvaUmX0nqZGJiIhI3OJ+Kc7dVwJ3AqOBs4EJZvaJmV2WrOBEREQkPvE+Q+9oZuOBFUAv4Lvu3jZcHp/E+ERERCQO8fbQHwIWA53c/Xp3Xwzg7usJeu0JMbObzGy5mX1kZk+bWZaZ5ZvZfDNbaWbTzeywROsVERGpr+JN6BcCT7n7LgAza2Bm2QDu/mQiJzSz44BRQJG7dwAygCuA+4Hx7n4i8A/gh4nUKyIiUp/Fm9D/AjSOWc8Oy6qrIdDYzBqGdW0guH0/I9w+BbjkEOoXERGpV+JN6Fnuvr10JVzOrs4J3f1/gXHAGoJEvhVYBGxx973hbuuA46pTv4iISH0Ub0LfYWaFpStmdiqwqzonNLOjgYuBfKAlcDhwQTm7egXHDzezhWa2cOPGjdUJQUREJHLiHfr1RuA5M1sfrucC36/mOc8DvnD3jQBm9ifgDKCpmTUMe+mtgPXlHezujwKPAhQVFZWb9EVEROqbuBK6u79vZqcAJwMGfOLue6p5zjXAaeFLdbuA3sBCYA4wEHgGGAq8VM36RURE6p1EJmfpCuSFx3QxM9x9aqIndPf5ZjaD4Gtwe4EPCHrcrwDPmNnPwrLHE61bRESkvooroZvZk8C/AkuAb8JiBxJO6ADufjdw9wHFnwPdqlOfiIhIfRdvD70IaOfuemYtIiKShuJ9y/0j4F+SGYiIiIhUX7w99BzgYzNbAHxdWuju/ZMSlYiIiCQk3oQ+NplBiIiIyKGJ92tr/21mJwAnuvtfwq+cZSQ3NBEREYlXvNOnXkMwzvrEsOg44MVkBSUiIiKJifeluOuBHsBXAO6+EmierKBEREQkMfEm9K/d/Z+lK+EsafoKm4iISJqIN6H/t5ndTjDl6XeA54CXkxeWiIiIJCLehD4G2Ah8CFwLvArcmaygREREJDHxvuW+D3gs/IiIiEiaiXcs9y8o55m5u7ep8YhEREQkYYmM5V4qC7gcOKbmwxEREZHqiOsZurtvivn8r7s/CPRKcmwiIiISp3hvuRfGrDYg6LEfkZSIREREJGHx3nJ/IGZ5L1ACfK/GoxEREZFqifct93OTHYiIiIhUX7y33G+ubLu7/6ZmwhEREZHqSOQt967AzHD9u8A8YG0yghIREZHExJvQc4BCd98GYGZjgefc/UfJCkxERETiF+/Qr62Bf8as/xPIq/FoREREpFri7aE/CSwwsxcIRoy7FJiatKhEREQkIfG+5f5zM/szcFZYNMzdP0heWCIiIpKIeG+5A2QDX7n7b4F1ZpafpJhEREQkQXEldDO7GxgN3BYWZQJ/TFZQIiIikph4e+iXAv2BHQDuvh4N/SoiIpI24k3o/3R3J5xC1cwOT15IIiIikqh4E/qzZjYRaGpm1wB/AR5LXlgiIiKSiHjfch9nZt8BvgJOBu5y99lJjUxERETiVmVCN7MM4DV3Pw+okSRuZk2BPwAdCG7jXw18CkwnGLCmBPieu/+jJs4nIiISdVXecnf3b4CdZnZUDZ73t8B/ufspQCdgBTAGeMPdTwTeCNdFREQkDvGOFLcb+NDMZhO+6Q7g7qMSPaGZHQn0BIrDOv4J/NPMLgbOCXebAswl+KqciIiIVCHehP5K+KkJbYCNwBNm1glYBPwYaOHuGwDcfYOZNa+h84mIiERepQndzFq7+xp3n1LD5ywEbnD3+Wb2WxK4vW5mw4HhAK1bt67BsJJv556dce+bnZmdxEhERCRqqnqG/mLpgpk9X0PnXAesc/f54foMggT/NzPLDc+VC3xZ3sHu/qi7F7l7UbNmzWooJBERkbqtqlvuFrPcpiZO6O7/Z2Zrzexkd/8U6A18HH6GAveFP1+qifMlKpFetIiISLqoKqF7BcuH6gZgmpkdBnwODCO4W/Csmf0QWANcXoPnExERibSqEnonM/uKoKfeOFwmXHd3P7I6J3X3JUBROZt6V6c+ERGR+q7ShO7uGbUViIiIiFRfIvOhi4iISJpSQhcREYkAJXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhQQhcREYkAJXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhQQhcREYkAJXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhQQhcREYkAJXQREZEIUEIXERGJACV0ERGRCFBCFxERiQAldBERkQhQQhcREYkAJXQREZEIUEIXERGJgJQldDPLMLMPzGxWuJ5vZvPNbKWZTTezw1IVm4iISF2Tyh76j4EVMev3A+Pd/UTgH8APUxKViIhIHZSShG5mrYCLgD+E6wb0AmaEu0wBLklFbCIiInVRqnroDwK3AvvC9WOBLe6+N1xfBxyXisBERETqolpP6GbWD/jS3RfFFpezq1dw/HAzW2hmCzdu3JiUGEVEROqaVPTQewD9zawEeIbgVvuDQFMzaxju0wpYX97B7v6ouxe5e1GzZs1qI14REZG0V+sJ3d1vc/dW7p4HXAG86e6DgDnAwHC3ocBLtR2biIhIXZVO30MfDdxsZqsInqk/nuJ4RERE6oyGVe+SPO4+F5gbLn8OdEtlPCIiInVVOvXQRUREpJqU0EVERCJACV1ERCQClNBFREQiQAldREQkApTQRUREIkAJXUREJAKU0EVERCJACV1ERCQClNBFREQiQAldREQkApTQRUREIkAJXUREJAKU0EVERCJACV1ERCQClNBFREQiQAldREQkApTQRUREIqBhqgOQ8u3csxOADds3lJXNWz2Pa1+eVu7+E787sVbiEhGR9KSELjXo2lQHUAn9wSMi0aZb7iIiIhGghC4iIhIBSugiIiIRoIQuIiISAUroIiIiEaCELiIiEgFK6CIiIhGg76HXIWNHjSO3/coKtqbzd8BFRCTZ1EMXERGJgFpP6GZ2vJnNMbMVZrbczH4clh9jZrPNbGX48+jajk1ERKSuSkUPfS/w7+7eFjgNuN7M2gFjgDfc/UTgjXBdRERE4lDrCd3dN7j74nB5G7ACOA64GJgS7jYFuKS2YxMREamrUvoM3czygC7AfKCFu2+AIOkDzVMXmYiISN2SsrfczawJ8Dxwo7t/ZWbxHjccGA7QunXr5AUYYfNWz0to/54n9ExSJCIiUlNS0kM3s0yCZD7N3f8UFv/NzHLD7bnAl+Ud6+6PunuRuxc1a9asdgIWERFJc7XeQ7egK/44sMLdfxOzaSYwFLgv/PlSbcdWF2xYfmK55fNWJ1DH9vLrqEgidQP0VIdeRKTWpeKWew9gMPChmS0Jy24nSOTPmtkPgTXA5SmITUREpE6q9YTu7m8DFT0w712bsYiIiESFhn6NiA3bN6Q6BBERSSEN/SoiIhIB6qGnuaLNS1MdAtnbgglhVubqbTcRkXSlHrqIiEgEqIcep5yjK5+e9O//mFhLkYiIiBxMPXQREZEIUA9datSG7RuYt3pl3PtrWFkRkZqhHrqIiEgEqIcuNa6i4WnLk+iwstU1bdq3yxP1uoOIRJB66CIiIhGgHnodk4q37Xfu2QloNDoRkXSmHrqIiEgEqIcucUvGqHWlo9AlSqPWiYjsTz10ERGRCFBCFxERiQAldBERkQhQQhcREYkAJXQREZEI0FvuNaSq74eDZmQTEZHkUUKXlCodtCYe2ZnZSYxERKRu0y13ERGRCFBCFxERiQAldBERkQjQM3SpllROEgNw3Jr/qnTfA5+3D5oXE2/V7y8mTnOyikiKqYcuIiISAeqh16J4vtpWG9IljqW5UhgAAAoISURBVGQ68O352Klf562u3oQwpXqeoIlhatO1Lyf2/+vE7+puScKuTaN/E3S3q9rUQxcREYkA9dClXij65bdTv2Yf89WhVdZk3sFl89pWv75pNdzjr6EeTmnPeFDHcq73AMm8a1HZ+actO/i8ifTo1Zuv+9Lp5kJ5avOGg3roIiIiEZB2PXQz6wv8FsgA/uDu96U4pDolXZ6Pp/NQuDs3H3lIx2/YluD5qhgNb+Gr3z7fz22SC0DPOvaYft7qqnvxqTbo4YNjnPfwIdxZqYLetZDallY9dDPLAB4GLgDaAVeaWbvURiUiIpL+0q2H3g1Y5e6fA5jZM8DFwMcpjUokiWKf70Ow/D9xHlvu+PbTa6bXeWvpnYW3yjlvnO8hrGz/LzUSS3XMe3jQfuu9lm+oYM/k2DAs/rsWicxpUBMW3tYprv1y21f+jZDk3YWI/07joEFV73OoYr8lM3bCLZXum8o7M2nVQweOA9bGrK8Ly0RERKQS5u6pjqGMmV0OnO/uPwrXBwPd3P2GmH2GA8PD1ZOBT2swhBzg7zVYXxSpjeKjdqqa2qhqaqP41Ld2OsHdmx1YmG633NcBx8estwLWx+7g7o8Cjybj5Ga20N2LklF3VKiN4qN2qpraqGpqo/ionQLpdsv9feBEM8s3s8OAK4CZKY5JREQk7aVVD93d95rZSOA1gq+tTXL35SkOS0REJO2lVUIHcPdXgVdTdPqk3MqPGLVRfNROVVMbVU1tFB+1E2n2UpyIiIhUT7o9QxcREZFqUEInGG7WzD41s1VmNibV8dQ2M5tkZl+a2UcxZceY2WwzWxn+PDosNzObELbVMjMrjDlmaLj/SjMbmoprSRYzO97M5pjZCjNbbmY/DsvVTiEzyzKzBWa2NGyje8LyfDObH17v9PCFV8ysUbi+KtyeF1PXbWH5p2Z2fmquKHnMLMPMPjCzWeG62ugAZlZiZh+a2RIzWxiW6fetMu5erz8EL9/9D9AGOIxgqK52qY6rltugJ1AIfBRT9itgTLg8Brg/XL4Q+DNgwGnA/LD8GODz8OfR4fLRqb62GmyjXKAwXD4C+IxgeGK107dtZECTcDkTmB9e+7PAFWH5I8C/hcsjgEfC5SuA6eFyu/D3sBGQH/5+ZqT6+mq4rW4GngJmhetqo4PbqATIOaBMv2+VfNRDjxlu1t3/CZQON1tvuPs8YPMBxRcDU8LlKcAlMeVTPfBXoKmZ5QLnA7PdfbO7/wOYDfRNfvS1w903uPvicHkbsIJgFEO1Uyi81u3hamb4caAXMCMsP7CNSttuBtDbzCwsf8bdv3b3L4BVBL+nkWBmrYCLgD+E64baKF76fauEErqGm61IC3ffAEEyA5qH5RW1V71px/C2ZxeCHqjaKUZ4K3kJ8CXBP57/A2xx973hLrHXW9YW4fatwLFEvI2AB4FbgX3h+rGojcrjwOtmtsiCEUJBv2+VSruvraWAlVOmV/8rVlF71Yt2NLMmwPPAje7+VdBZKn/Xcsoi307u/g3Q2cyaAi8A5c0UU3q99a6NzKwf8KW7LzKzc0qLy9m13rZRjB7uvt7MmgOzzeyTSvatz+1URj30OIabraf+Ft6yIvz5ZVheUXtFvh3NLJMgmU9z9z+FxWqncrj7FmAuwfPMpmZW2nmIvd6ytgi3H0Xw6CfKbdQD6G9mJQSP93oR9NjVRgdw9/Xhzy8J/jjshn7fKqWEruFmKzITKH0jdCjwUkz5kPCt0tOAreGtr9eAPmZ2dPjmaZ+wLBLC55aPAyvc/Tcxm9ROITNrFvbMMbPGwHkE7xrMAQaGux3YRqVtNxB404M3mWYCV4RveOcDJwILaucqksvdb3P3Vu6eR/BvzZvuPgi10X7M7HAzO6J0meD35CP0+1a5VL+Vlw4fgjckPyN43ndHquNJwfU/DWwA9hD8RftDgud0bwArw5/HhPsa8HDYVh8CRTH1XE3wcs4qYFiqr6uG2+hMglt1y4Al4edCtdN+bdQR+CBso4+Au8LyNgTJZhXwHNAoLM8K11eF29vE1HVH2HafAhek+tqS1F7n8O1b7mqj/dumDcFb/EuB5aX/Luv3rfKPRooTERGJAN1yFxERiQAldBERkQhQQhcREYkAJXQREZEIUEIXERGJACV0kRQzs+1V73VI9RebWcuY9RIzy0mwjrlmVlTz0ZXV/2rpd9gr2edGM8tO5BiR+kQJXST6ioGWVe2UCuFAIA3c/UIPRperzI1AWUKP8xiRekMJXSQNhaOuPW9m74efHmH5WAvmr59rZp+b2aiYY35iZp+E80Q/bWa3mNlAoAiYFs4r3Tjc/QYzWxzON31KOedvbGbPhHNLTwcax2zrY2bvhcc/F45vj5ndZ2Yfh8eMC8tamNkLFsyRvtTMzjCzPAvmlf8dsBg4vvSuQbjtEzObEtYzw8yyw+tsCcwxszlh3WV3GszsZjP7KPzcGJaVnucxC+Znfz3m+kWiJ9Uj2+ijT33/ANvLKXsKODNcbk0w5CzAWOBdgnmwc4BNBNOUFhGMXteYYL72lcAt4TFz2X/krBLghnB5BPCHcs5/MzApXO4I7A3PkQPMAw4Pt40G7iKYb/pTKBusqmn4czrBRDYAGQRjkecRzDR22gEx5YTbnGBiDoBJMddRQsz82DHHnEowOtjhQBOCkcW6hHXtBTqH+z8LXJXq/9766JOsj3roIunpPOAhC6YinQkcWTq2NfCKB/Ng/51gcooWBEPTvuTuuzyYr/3lKuovnVxmEUHiO1BP4I8A7r6MYDhXCCZbaQe8E8Y2FDgB+ArYDfzBzC4Ddob79wJ+H9bzjbtvDctXezBvdXnWuvs74fIfw2urzJnAC+6+w4P52P8EnBVu+8Ldl1RxrSKRoOlTRdJTA+B0d98VWxjMEcPXMUXfEPweVziPawVK6yg9vjzljQttwGx3v/KgDWbdgN4Ek46MJEjmFdlRybYDz1vV+NSVXfuBbaVb7hJZ6qGLpKfXCZIiAGbWuYr93wa+a2ZZ4TPti2K2bSO4DZ+IecCg8NwdCG67A/wV6GFm/y/clm1mJ4XnPMrdXyV4ea003jeAfwv3zTCzI+M4d2szOz1cvjK8tsquYx5wSRjL4cClwFvxX6pINCihi6Retpmti/ncDIwCisIXwz4GrqusAnd/n+DW/FKCW84LgdLb25OBRw54Ka4qvweamNky4FbCqTndfSPBW/NPh9v+CpxCkGhnhWX/DdwU1vNj4Fwz+5Dglnf7OM69Ahga1nVMGAvAo8CfS1+Ki7n2xeE1LgDmE7wT8EGc1ykSGZptTSQizKyJu28Pv6s9DxgeJrs6w8zyCKYU7ZDiUETqHD1DF4mOR82sHcEc2lPqWjIXkUOjHrqIiEgE6Bm6iIhIBCihi4iIRIASuoiISAQooYuIiESAErqIiEgEKKGLiIhEwP8H4HnQbkiJTeAAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 576x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(8, 5))\n",
    "df[df.product_type == 'eye_makeup'].length.plot(bins=35, kind='hist', color='green', label='eye_makeup', alpha=0.6)\n",
    "df[df.product_type == 'lipstick'].length.plot(kind='hist', color='blue', label='lipstick', alpha=0.6)\n",
    "df[df.product_type == 'foundation'].length.plot(kind='hist', color='red', label='foundation', alpha=0.6)\n",
    "df[df.product_type == 'contour'].length.plot(kind='hist', color='yellow', label='contour', alpha=0.6)\n",
    "df[df.product_type == 'nail_polish'].length.plot(kind='hist', color='pink', label='nail_polish', alpha=0.6)\n",
    "plt.legend()\n",
    "plt.xlabel(\"Length description\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The length of characters divided into categories:**\n",
    "\n",
    "Histogram of description length of each categories:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtUAAALGCAYAAACK8W1eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzde5RlVXnv/e+PW7ygCNIiAk0bxQvJiRD7IHk98RgQBfEIZojKiUAS8rbjjRpITBT15KgxJm2G1wwTFQVtDRENaiBiFIKiMVG0QVSg1VYEudNGCKBGbXjeP9YqKYoqu6rWvq76fsaoUXuvtfbez+yea86n5p5rrlQVkiRJkpZvu3EHIEmSJE07k2pJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpI5NqSZIkqSOTakmStCIluTDJ7407DvWDSbWmWpJXJ/m7ccchSZJWNpNqaQFJdhh3DJIkaTqYVGukkuyT5CNJtiT5jyRvS7Jdkv+T5OokNyd5X5Jd2uPXJKkkJyT5bpLvJXllu+9w4BXAc5PckeQr7faHJTknyfeTfCvJ/zvr89+b5M9nPX9ykmtnPb8qycuSfBX4gYm1JG1b2+5+uG3bv5PkD5I8NMkPkzx41nGPb4/ZsX3+u0k2JbklySeT7LuIz6okv59kc5Lbk7w2ySOSfD7JbUk+lGSn9thdk3ys/cxb2sd7L/C+eyb5apI/bp/vkuS0JDckuS7JnyfZvt13j29JZ/VVO7TPL0zyl0m+mOQ/k5ydZLcu/8aafCbVGpm2MfoYcDWwBtgLOBP47fbnN4BfBHYG3jbn5f8DeDRwKPB/kzy2qj4B/AXwwarauaoe1x77AeBa4GHAs4G/SHLoEkI9FjgSeFBVbV1aKSVpZUmyHfBPwFdo2vVDgZOBxwEXAs+ZdfjzgTOr6qdJjqYZGPlNYBXwrzTt92IcDjweOBh4KXAq8FvAPsAv07Tj0OQ57wH2BVYDP+Le/QtJ1gCfAd5WVW9oN28AtgKPBA4EngosZf718cDv0vRFW4G/XsJrNYVMqjVKB9E0Ln9SVT+oqv+qqs/RNIRvqqorq+oO4OXA8+aMEr+mqn5UVV+habgfd693pxkJp0nAX9a+/6XAu4HjlhDnX1fVNVX1o6UXUZJWnP8OrKqqP6uqn1TVlcC7gOfRJKbPh58NrBwLvL993QuAv6yqTe0Axl8AByxmtBp4fVXdVlWXA5cB57V9yH8C/0yTBFNV/1FVH66qH1bV7cDrgP855732p0n+X1VVp7ax7gEcAZzc9lc3A29uy7RY76+qy6rqB8CfAs+ZGelWP/nVtkZpH+DqeUZ/H0Yzej3japq6ucesbTfOevxDmtHs+TwM+H7beM5+v7VLiPOaJRwrSSvdvsDDktw6a9v2NCPPZwPvSPKLwKOA/6yqL8563VuTvHHW60Iz2j27T5jPTbMe/2ie5w8FSHI/mmT4cGDXdv8DkmxfVXe2z38L+BZw1pwy7QjckGRm23YsrX+YfezV7fvtPidW9YhJtUbpGmB1kh3mJNbX0zRgM1bTfFV2EzDv3LdZas7z64HdkjxgVmK9GriuffwD4H6zjn/oIt5TkrSwa4DvVNV+8+1M8iGaxPUx3D1KPfO611XVGUOM7SU0UwefUFU3JjkA+DJN8j7j1TRJ998neV6bbF8D/BjYfYFpgIvpS/aZ9Xg18FPge8stiCaf0z80Sl8EbgDWJ7l/kvskeSLNHLo/TPLwJDtz9zzpxcxnvglY087po6quAf4d+Mv2/X8FOBGYabQvBZ6eZLckD6WZ9ydJWr4vAre1F3nfN8n2SX45yX9v97+P5rqZZwKzl0B9B/DyJL8EP7sw8JgBx/YAmpHrW9sLBV81zzE/BY4B7g+8P8l2VXUDcB7wxiQPbC+of0SSmakjlwJPSrK6vbD+5fO87/OT7N+Olv8ZcNas0XH1kEm1RqZtTP4XzUUf36W5mPC5wOk0oxefBb4D/Bfw4kW+7T+0v/8jySXt42NpLoS8HvgozTy589t976eZk30VTYP5wWUXSJI0u20/gKYN/x7NtSy7tPv/DbgLuKSqrpr1uo8CrwfOTHIbzdzoIwYc3luA+7YxfQH4xAJl+AnNBZMPAU5vB2qOB3YCrgBuoZkesmd7/Pk0/cdXgYtpLsKf6/3Ae2mmL94H+IMBlUkTKlV+0y1JkoYnyaeAv6+qd487llFIciHwdyulvGo4p1qSJA1NOw3kV4Gjxh2LNExO/5AkSUORZAPwLzRL092+rePb1/x6e0Ove/0MN1qpG6d/SJIkSR05Ui1JkiR1ZFItSZIkdTTSCxV33333WrNmzSg/UpKW5eKLL/5eVa0adxwrgX2DpGnx8/qGkSbVa9asYePGjaP8SElaliTbuk2yBsS+QdK0+Hl9g9M/JEnLkuQPk1ye5LIkH2jvYvrwJBcl2Zzkg0l2GneckjQKJtWSpCVLshfNHeLWVtUvA9sDz6O5Q96bq2o/mrvQnTi+KCVpdLaZVCfZJ8mnk2xqRyROare/Osl1SS5tf54+/HAlSRNkB+C+SXYA7gfcABxCcztngA3A0WOKTZJGajFzqrcCL6mqS5I8ALg4yfntvjdX1RuGF54kaRJV1XVJ3gB8F/gRcB5wMXBrVW1tD7sW2Gu+1ydZB6wDWL169fADlqQh2+ZIdVXdUFWXtI9vBzaxQCMpSVoZkuxKc9vphwMPA+4PHDHPofPeYayqTq2qtVW1dtUqF1mRNP2WtPpHkjXAgcBFwBOBFyU5HthIM5p9yzyvGeloxJpTzr3XtqvWHzn0z5WkFeYpwHeqagtAko8A/w/woCQ7tKPVewPXjzFGYP5+AewbJA3Woi9UTLIz8GHg5Kq6DXg78AjgAJp5dG+c73WORkhSL30XODjJ/ZIEOBS4Avg08Oz2mBOAs8cUnySN1KKS6iQ70iTUZ1TVRwCq6qaqurOq7gLeBRw0vDAlSZOkqi6iuSDxEuBrNP3JqcDLgD9K8i3gwcBpYwtSkkZom9M/2hGI04BNVfWmWdv3rKob2qfPAi4bToiSpElUVa8CXjVn85U4yCJpBVrMnOonAscBX0tyabvtFcCxSQ6guQjlKuAFQ4lQkiRJmnDbTKqr6nNA5tn18cGHI0mSJE0f76goSZIkdWRSLUmSJHVkUi1JkiR1ZFItSZIkdWRSLUmSJHVkUi1JkiR1ZFItSZIkdbSYm79MpDWnnDvuECRJkiRgipNqSZLmcsBF0rg4/UOSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqaJtJdZJ9knw6yaYklyc5qd2+W5Lzk2xuf+86/HAlSZKkybOYkeqtwEuq6rHAwcALk+wPnAJcUFX7ARe0zyVJkqQVZ5tJdVXdUFWXtI9vBzYBewFHARvawzYARw8rSEmSJGmSLWlOdZI1wIHARcAeVXUDNIk38JAFXrMuycYkG7ds2dItWkmSJGkCLTqpTrIz8GHg5Kq6bbGvq6pTq2ptVa1dtWrVcmKUJEmSJtqikuokO9Ik1GdU1UfazTcl2bPdvydw83BClCRJkibbYlb/CHAasKmq3jRr1znACe3jE4CzBx+eJGlSJXlQkrOSfL1dIerXXBlK0kq1mJHqJwLHAYckubT9eTqwHjgsyWbgsPa5JGnleCvwiap6DPA4mgvZXRlK0oq0w7YOqKrPAVlg96GDDUeSNA2SPBB4EvDbAFX1E+AnSY4CntwetgG4EHjZ6COUpNHyjoqSpOX4RWAL8J4kX07y7iT3x5WhJK1QJtWSpOXYAfhV4O1VdSDwA5Yw1cOVoST1jUm1JGk5rgWuraqL2udn0STZrgwlaUUyqZYkLVlV3Qhck+TR7aZDgStwZShJK9Q2L1SUJGkBLwbOSLITcCXwOzSDNR9KciLwXeCYMcYnSSNjUi1JWpaquhRYO88uV4aStOI4/UOSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqyKRakiRJ6sikWpIkSerIpFqSJEnqaJtJdZLTk9yc5LJZ216d5Lokl7Y/Tx9umJIkSdLkWsxI9XuBw+fZ/uaqOqD9+fhgw5IkSZKmxzaT6qr6LPD9EcQiSZIkTaUdOrz2RUmOBzYCL6mqW+Y7KMk6YB3A6tWrO3zc8q055dx7bbtq/ZFjiESSJEl9tNwLFd8OPAI4ALgBeONCB1bVqVW1tqrWrlq1apkfJ0mSJE2uZSXVVXVTVd1ZVXcB7wIOGmxYkiRJ0vRYVlKdZM9ZT58FXLbQsZIkSVLfbXNOdZIPAE8Gdk9yLfAq4MlJDgAKuAp4wRBjlCRJkibaNpPqqjp2ns2nDSEWSZIkaSp5R0VJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpI5NqSZIkqSOTaknSsiXZPsmXk3ysff7wJBcl2Zzkg0l2GneMkjQKJtWSpC5OAjbNev564M1VtR9wC3DiWKKSpBEzqZYkLUuSvYEjgXe3zwMcApzVHrIBOHo80UnSaJlUS5KW6y3AS4G72ucPBm6tqq3t82uBveZ7YZJ1STYm2bhly5bhRypJQ2ZSLUlasiTPAG6uqotnb57n0Jrv9VV1alWtraq1q1atGkqMkjRK27xNuSRJ83gi8MwkTwfuAzyQZuT6QUl2aEer9wauH2OMkjQyjlRLkpasql5eVXtX1RrgecCnquq3gE8Dz24POwE4e0whStJImVRLkgbpZcAfJfkWzRzr08YcjySNhNM/JEmdVNWFwIXt4yuBg8YZjySNg0m1JGlFWnPKuffadtX6I8cQiaQ+cPqHJEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1NFUXKg438Ukw3rPUV6kMgkxSJIkqbupSKolSZptGIMtktTFNqd/JDk9yc1JLpu1bbck5yfZ3P7edbhhSpIkSZNrMXOq3wscPmfbKcAFVbUfcEH7XJIkSVqRtplUV9Vnge/P2XwUsKF9vAE4esBxSZIkSVNjuat/7FFVNwC0vx8yuJAkSZKk6TL0JfWSrEuyMcnGLVu2DPvjJEmSpJFbblJ9U5I9AdrfNy90YFWdWlVrq2rtqlWrlvlxkiRJ0uRablJ9DnBC+/gE4OzBhCNJkiRNn8UsqfcB4PPAo5Ncm+REYD1wWJLNwGHtc0mSJGlF2ubNX6rq2AV2HTrgWCRJkqSpNPQLFSVJkqS+M6mWJEmSOjKpliRJkjoyqZYkSZI6MqmWJEmSOjKpliRJkjoyqZYkSZI62uY61SvNmlPOvde2q9YfOYZIJEmSNC1MqiVJas03sAIOrkjaNqd/SJIkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR15oeIEcgUSSZKk6eJItSRpyZLsk+TTSTYluTzJSe323ZKcn2Rz+3vXcccqSaNgUi1JWo6twEuq6rHAwcALk+wPnAJcUFX7ARe0zyWp90yqJUlLVlU3VNUl7ePbgU3AXsBRwIb2sA3A0eOJUJJGy6RaktRJkjXAgcBFwB5VdQM0iTfwkAVesy7JxiQbt2zZMqpQJWloTKolScuWZGfgw8DJVXXbYl9XVadW1dqqWrtq1arhBShJI2JSLUlaliQ70iTUZ1TVR9rNNyXZs92/J3DzuOKTpFEyqZYkLVmSAKcBm6rqTbN2nQOc0D4+ATh71LFJ0ji4TrUkaTmeCBwHfC3Jpe22VwDrgQ8lORH4LnDMmOKTpJHqlFQnuQq4HbgT2FpVawcRlCRpslXV54AssPvQUcYiSZNgECPVv1FV3xvA+0iSNJG8062kbXFOtSRJktRR16S6gPOSXJxk3SACkiRJkqZN1+kfT6yq65M8BDg/yder6rOzD2iT7XUAq1ev7vhx02u+rw4lSZLUD51Gqqvq+vb3zcBHgYPmOcYF/iVJktRry06qk9w/yQNmHgNPBS4bVGCSJEnStOgy/WMP4KPN+v/sAPx9VX1iIFFJkiRJU2TZSXVVXQk8boCxSJIkSVPJOypKkjRArmktrUwm1QM2rFU+FnpfG2pJkqTx8+YvkiRJUkcm1ZIkSVJHTv+QJGkZvKmXpNkcqZYkSZI6cqRakqQxWMpItxelS5PPpHoRJvkrPpdukiRJGj+TakmShmySB2ckDYZzqiVJkqSOTKolSZKkjkyqJUmSpI5MqiVJkqSOvFCxhxa6IGa+VUFcPUSSJKk7k2pJkibcUgZLJI2H0z8kSZKkjhypliRpSjmFT5ocjlRLkiRJHTlSLUlSjzj/erj8dkALMaleQfpwm9yujdkg/g1G2XjaeC+NyYQkaVxMqiVJWgGWMqgwTX+ITsIf05MQg8bPOdWSJElSR51GqpMcDrwV2B54d1WtH0hUkqSpZd8w/fo69ayvo/WaDMseqU6yPfA3wBHA/sCxSfYfVGCSpOlj3yBppeoyUn0Q8K2quhIgyZnAUcAVgwhMkjSV7Bt6algXu/d5RHiU/2ZdP2uh/4c+fGsxqjJ0mVO9F3DNrOfXttskSSuXfYOkFSlVtbwXJscAT6uq32ufHwccVFUvnnPcOmBd+/TRwDeW+FG7A99bVpCTx7JMJssymcZdln2ratUYP38q2TcsmeWYHH0oA1iOYVuwb+gy/eNaYJ9Zz/cGrp97UFWdCpy63A9JsrGq1i739ZPEskwmyzKZ+lSWFca+YQksx+ToQxnAcoxTl+kfXwL2S/LwJDsBzwPOGUxYkqQpZd8gaUVa9kh1VW1N8iLgkzTLJp1eVZcPLDJJ0tSxb5C0UnVap7qqPg58fECxLGTZXw9OIMsymSzLZOpTWVYU+4YlsRyTow9lAMsxNsu+UFGSJElSw9uUS5IkSR2ZVEuSJEkddZpTPWhJHkNz5629gKJZhumcqto01sAkSWNj3yBpGkzMnOokLwOOBc6kWecUmvVNnwecWVXrxxWbIMkuwOHcs1P7ZFXdOtbAliFJaG6lPLssX6xJORkWqS/lmNGnOqbBsW+YPH05V/vShvaoHFNfryYpqf4m8EtV9dM523cCLq+q/cYT2fL1oYIAJDkeeBVwHnBdu3lv4DDgNVX1vnHFtlRJngr8LbCZe5blkcDvV9V544ptKfpSjhl9qmMarL71DdPeL/TlXO1LG9qjcvSjXk1QUv11mlvbXj1n+77AeVX16PFEtjx9qSAASb4BPGFuo59kV+CiqnrUeCJbuiSbgCOq6qo52x8OfLyqHjuWwJaoL+WY0ac6psHqU9/Qh36hL+dqX9rQHpWjF/VqkuZUnwxckGQzcE27bTXNX1svGltUy/dK4PELVRBg4hvPWUIzojLXXe2+abIDd3+FPNt1wI4jjqWLvpRjRp/qmAarT31DH/qFvpyrfWlD+1KOXtSriUmqq+oTSR7F3fOCQlNRvlRVd441uOXpRQVpvQ64JMl53LNTOwx47diiWp7TgS8lOZO7y7IPzfzM08YW1dL1pRwz+lTHNEA96xv60C/05VztSxval3L0ol5NzPSPvklyAvB/ab7mu1cFqar3jim0ZWlHUp7GPTu1T1bVLWMNbBmS7A88k3uW5ZyqumKsgS1RX8oxo091TJpPX/qFvpyrfWlDe1SOqa9XJtVD1IcKMluSPZh1cU1V3TTmkDpJshtQ0/r/MaMv5YD+1TFprr70C306V/vShvahHNNer0yqh2zaKwhAkgOAdwC70HQAobm45laaq4svGWN4S5JkNfBXwCHAf7abdwE+BZwy92KPSdWXcszoUx2TtmWa+4W+nKt9aUN7VI5+1CuT6uHoSwUBSHIp8IKqumjO9oOBd1bV48YT2dIl+TzwFuCsmfmYSbYHjgFOrqqDxxnfYvWlHDP6VMekhfShX+jLudqXNrRH5ehHvTKpHo6+VBCAJJsXWgs2ybeq6pGjjmm5tlGWBfdNmr6UY0af6pi0kD70C305V/vShq6QckxNvZqY1T966P5zG06AqvpCkvuPI6AO/jnJuTTLPc2+uvh44BNji2p5Lk7yt8AG7lmWE4Avjy2qpetLOWb0qY5JC+lDv9CXc7UvbWhfytGLeuVI9ZAk+WvgEcxfQb5TVVO1vmqSI4CjuPfVxR8fa2BL1N6F7UTmKQtwWlX9eIzhLVpfyjFbX+qYtJC+9At9OFf70ob2pRzQk3plUj08faggkqTBsV+Q+sukWtuUZBfg5TQdwUPazTcDZwPr594dbJIl2YHmr/qjmXX1PU1ZTquqn44xvEXrSzlm9KmOSX3Wl3O1L21oj8rRj3plUj0cfakgAEk+SbM8z4aqurHd9lDgt4FDq+qwMYa3JEk+QHOl/QbuvrXr3jTzz3arqueOK7al6Es5ZvSpjkkL6UO/0JdztS9taI/K0Y96ZVI9HH2pIABJvlFVj17qvkm0jbJ8s6oeNeqYlqMv5ZjRpzomLaQP/UJfztW+tKErpBxTU6+2G3cAPbamql4/03ACVNWNVbWe5ra00+TqJC9tb1gANDcvSPIy7r7YZlrckuSYJD+r+0m2S/JcYJruQtWXcszoUx2TFtKHfqEv52pf2tC+lKMX9cqkenh6UUFazwUeDHwmyS1Jvg9cCOwGPGecgS3D84BnAzcm+WaSbwI3Ar/Z7psWM+W4qS3HZqazHDP6VMekhfShX+jLuWpfMFl6Ua+c/jEkSXYFTuGec+duolnmZn1VTdNfkCR5DM08rS9U1R2zth9eVVOzhiRAkifQXMzxbeCxwMHAFdN69X2SB9OsIvCWqnr+uOMZhCS/DhwEfK2qzht3PNIg9KVf6Et/YF8wuaa1DzCpHoMkv1NV7xl3HIuV5A+AFwKbgAOAk6rq7HbfJVX1q+OMbymSvAo4gubGR+fTnLSfAZ4CfLKqXjfG8BYtyTnzbD6EZr4mVfXM0UbUTZIvVtVB7ePfo6lv/wg8Ffin9utxqbempV/oS39gXzBZ+tIHmFSPQZLvVtW0zJ8jydeAX6uqO5KsAc4C3l9Vb03y5ao6cKwBLkFblgOAX6D5imzvqrotyX2Bi6rqV8Ya4CIluQS4Ang3zUhLgA/Qft1XVZ8ZX3RLN7seJfkS8PSq2pLmLnNfqKr/Nt4IpeGaln6hL/2BfcFk6Usf4G3KhyTJVxfaBeyxwL5Jtf3MV3xVdVWSJwNnJdmXpjzTZGtV3Qn8MMm3q+o2gKr6UZK7xhzbUqwFTgJeCfxJVV2a5EfT0oDOY7v2q/HtaP7Y3wJQVT9IsnW8oUmD0ZN+oS/9gX3BZOlFH2BSPTx7AE/j3lffBvj30YfTyY1JDqiqSwHaEYpnAKcDU/HX4yw/SXK/qvoh8PiZjWnWj52ahrSq7gLenOQf2t83Md3n8y7AxTTnRyV5aFXdmGRnpqujln6ePvQLfekP7AsmSy/6gGn8h58WHwN2nml4Zkty4ejD6eR44B5/KVbVVuD4JO8cT0jL9qSq+jH8rDGasSPNYvlTpaquBY5JciRw27jjWa6qWrPArruAZ40wFGmY+tAv9KU/sC+YIH3pA5xTLUmSJHXkOtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVGogkj07y5SS3J/mDEX7uVUmesszX/nqSbww6JknS0s2050lekeTdQ3j/30py3iKOe3WSvxv056v/dhh3AOqNlwIXVtWB4w5kIUkK2K+qvgVQVf8KPHq8UUmSZquqv+j6HknWAN8Bdqyqre37ngGc0fW9pYU4Uq1B2Re4fNxBSJIkjYNJtTpL8ingN4C3JbkjyeOSvC/JliRXJ/k/SbZrj73H12pJ1iSpJDu0zy9M8tok/9ZOJTkvye6zjj+ufc//SPLKOXEclOTzSW5NckOStyXZqd332fawr7QxPjfJk5NcO+v1j20//9Yklyd55qx9703yN0nObeO6KMkjhvDPKUkr2ux+YlYfsS7J9W3b/pJZxx6UZGOS25LclORN7a6ZNv/Wts3/tSS/neRzs177S0nOT/L99rWvmCeWHZN8IMmHZ/oTaSEm1eqsqg4B/hV4UVXtDLwE2AX4ReB/AscDv7OEt/zf7fEPAXYC/hggyf7A24HjgIcBDwb2nvW6O4E/BHYHfg04FPj9NsYntcc8rqp2rqoPzv7AJDsC/wSc137ui4EzksyeHnIs8BpgV+BbwOuWUCZJ0vL9BrAf8FTglFnX0rwVeGtVPRB4BPChdvtMm/+gts3//Ow3S/IA4F+AT9D0J48ELphzzH2BfwR+DDynqn4y8FKpV0yqNVBJtgeeC7y8qm6vqquAN9Ikwov1nqr6ZlX9iKaBPKDd/mzgY1X12ar6MfCnwF0zL6qqi6vqC1W1tf3cd9Ik9YtxMLAzsL6qflJVnwI+RpNIz/hIVX2xnZ93xqy4JEnD9Zqq+kFVfQ14D3e3zT8FHplk96q6o6q+sMj3ewZwY1W9sar+q+2vLpq1/4E0Cfe3gd+pqjsHVRD1l0m1Bm13mtHlq2dtuxrYawnvceOsxz+kSXahGU24ZmZHVf0A+I+Z50keleRjSW5MchvwF208i/Ew4JqqumvWtrlxLxSXJGm4rpn1+GqaNhvgROBRwNeTfCnJMxb5fvvQJMwLORj4FZqBllpqsFqZTKo1aN+jGTnYd9a21cB17eMfAPebte+hS3jvG2gaQgCS3I9mCsiMtwNfp1nh44HAK4As8r2vB/aZmfs9T9ySpPHZZ9bj1TRtNlW1uaqOpZm293rgrCT3B7aVCF9DM11kIecBfwlckGSPZUetFcWkWgPVfkX2IeB1SR6QZF/gj4CZixMvBZ6UZHWSXYCXL+HtzwKekUKP718AACAASURBVOR/tBeM/Bn3rMMPAG4D7kjyGOD/m/P6m2jmec/nIpqE/6XthSlPBv4XcOYS4pMkDcefJrlfkl+iuebmgwBJnp9kVfst463tsXcCW2imBy7U5n8MeGiSk5P8QttfPWH2AVX1V8Df0yTWi/3WUyuYSbWG4cU0CeqVwOdoGqXTAarqfJrG8KvAxTQN26JU1eXAC9v3uwG4Bbh21iF/THOR4+3Au9rPme3VwIZ2dY/nzHnvnwDPBI6gGW3/W+D4qvr6YuOTJA3NZ2guEL8AeENVzdzE5XDg8iR30Fy0+Lx2jvQPaS4m/7e2zT949ptV1e3AYTSDJzcCm2kuhmTOca+luVjxX5LsNpyiqS/iVCFJkjSJ5ruJizSpHKmWJEmSOjKpliRJkjpy+ockSZLUkSPVkiRJUkcm1ZIkSVJHO4zyw3bfffdas2bNKD9Skpbl4osv/l5VrRp3HJMuyfbARuC6qnpGkofTrO++G3AJcFy7ZOWC7BskTYuf1zeMNKles2YNGzduHOVHStKyJLl63DFMiZOATcAD2+evB95cVWcmeQfNbaTf/vPewL5B0rT4eX2D0z8kScuSZG/gSODd7fMAh9Dc/RRgA3D0eKKTpNEyqZYkLddbgJfS3A4a4MHArbNu0nEtsNd8L0yyLsnGJBu3bNky/EglachMqiVJS5bkGcDNVXXx7M3zHDrvuq1VdWpVra2qtatWOXVd0vQb6ZxqSVJvPBF4ZpKnA/ehmVP9FuBBSXZoR6v3Bq4fY4ySNDKOVEuSlqyqXl5Ve1fVGuB5wKeq6reATwPPbg87ATh7TCFK0kitiJHqNaece69tV60/cgyRSFLvvQw4M8mfA18GThtzPPP2AWA/IGmwVkRSLUkanqq6ELiwfXwlcNA445GkcXD6hyRJktSRSbUkSZLUkUm1JEmS1FHv5lQvdEGKJEmSNCyOVEuSJEkdbTOpTnKfJF9M8pUklyd5Tbv94UkuSrI5yQeT7DT8cCVJkqTJs5jpHz8GDqmqO5LsCHwuyT8DfwS8uarOTPIO4ETg7UOMVZKkn8spgJLGZZsj1dW4o326Y/tTwCHAWe32DcDRQ4lQkiRJmnCLmlOdZPsklwI3A+cD3wZuraqt7SHXAnst8Np1STYm2bhly5ZBxCxJkiRNlEUl1VV1Z1UdAOxNc6esx8532AKvPbWq1lbV2lWrVi0/UkmSJGlCLWn1j6q6leZWtAcDD0oyMyd7b+D6wYYmSZIkTYfFrP6xKsmD2sf3BZ4CbAI+DTy7PewE4OxhBSlJkiRNssWs/rEnsCHJ9jRJ+Ieq6mNJrgDOTPLnwJeB04YYpyRJkjSxtplUV9VXgQPn2X4lzfxqSZIkaUXzjoqSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJH20yqk+yT5NNJNiW5PMlJ7fZXJ7kuyaXtz9OHH64kSZI0eXZYxDFbgZdU1SVJHgBcnOT8dt+bq+oNwwtPkiRJmnzbHKmuqhuq6pL28e3AJmCvYQcmSZpcSe6T5ItJvtJ+i/madvvDk1yUZHOSDybZadyxStIoLGlOdZI1wIHARe2mFyX5apLTk+y6wGvWJdmYZOOWLVs6BStJmhg/Bg6pqscBBwCHJzkYeD3Nt5j7AbcAJ44xRkkamUUn1Ul2Bj4MnFxVtwFvBx5B05jeALxxvtdV1alVtbaq1q5atWoAIUuSxq0ad7RPd2x/CjgEOKvdvgE4egzhSdLILSqpTrIjTUJ9RlV9BKCqbqqqO6vqLuBdwEHDC1OSNGmSbJ/kUuBm4Hzg28CtVbW1PeRaFpgu6LeYkvpmMat/BDgN2FRVb5q1fc9Zhz0LuGzw4UmSJlU7sHIAsDfNwMpj5ztsgdf6LaakXlnM6h9PBI4DvtaOSAC8Ajg2yQE0DeZVwAuGEqEkaaJV1a1JLgQOBh6UZId2tHpv4PqxBidJI7LNpLqqPgdknl0fH3w4kqRpkGQV8NM2ob4v8BSaixQ/DTwbOBM4ATh7fFFK0ugsZqRakqS59gQ2JNmeZirhh6rqY0muAM5M8ufAl2mmD0pS75lUS5KWrKq+SrPE6tztV+KF65JWoCWtUy1JkiTp3kyqJUmSpI5MqiVJkqSOTKolSZKkjkyqJUmSpI5MqiVJkqSOpnZJvTWnnDvuECRJkiTAkWpJkiSpM5NqSZIkqSOTakmSJKkjk2pJkiSpo6m9ULGrhS50vGr9kSOORJIkSdPOkWpJkiSpo20m1Un2SfLpJJuSXJ7kpHb7bknOT7K5/b3r8MOVJEmSJs9iRqq3Ai+pqscCBwMvTLI/cApwQVXtB1zQPpckSZJWnG0m1VV1Q1Vd0j6+HdgE7AUcBWxoD9sAHD2sICVJkqRJtqQ51UnWAAcCFwF7VNUN0CTewEMWeM26JBuTbNyyZUu3aCVJkqQJtOikOsnOwIeBk6vqtsW+rqpOraq1VbV21apVy4lRkiRJmmiLSqqT7EiTUJ9RVR9pN9+UZM92/57AzcMJUZIkSZpsi1n9I8BpwKaqetOsXecAJ7SPTwDOHnx4kiRJ0uRbzM1fnggcB3wtyaXttlcA64EPJTkR+C5wzHBClCRJkibbNpPqqvockAV2HzrYcCRJkqTp4x0VJUmSpI5MqiVJkqSOTKolSZKkjkyqJUmSpI4Ws/qHJEm9s+aUc++17ar1R44hEkl94Ei1JEmS1JFJtSRJktSRSbUkSZLUkUm1JGnJkuyT5NNJNiW5PMlJ7fbdkpyfZHP7e9dxxypJo2BSLUlajq3AS6rqscDBwAuT7A+cAlxQVfsBF7TPJan3TKolSUtWVTdU1SXt49uBTcBewFHAhvawDcDR44lQkkbLJfUkSZ0kWQMcCFwE7FFVN0CTeCd5yAKvWQesA1i9evWSP3O+5fAkaZwcqZYkLVuSnYEPAydX1W2LfV1VnVpVa6tq7apVq4YXoCSNiEm1JGlZkuxIk1CfUVUfaTfflGTPdv+ewM3jik+SRsmkWpK0ZEkCnAZsqqo3zdp1DnBC+/gE4OxRxyZJ47DNpDrJ6UluTnLZrG2vTnJdkkvbn6cPN0xJ0oR5InAccMicvmA9cFiSzcBh7XNJ6r3FXKj4XuBtwPvmbH9zVb1h4BFJkiZeVX0OyAK7Dx1lLJI0CbY5Ul1VnwW+P4JYJEmSpKnUZU71i5J8tZ0esuAds5KsS7IxycYtW7Z0+DhJkiRpMi13neq3A68Fqv39RuB35zuwqk4FTgVYu3ZtLfPzJEkauoXWv75q/ZEjjkTStFnWSHVV3VRVd1bVXcC7gIMGG5YkSZI0PZaVVM+sQdp6FnDZQsdKkiRJfbfN6R9JPgA8Gdg9ybXAq4AnJzmAZvrHVcALhhijJEmSNNG2mVRX1bHzbD5tCLFIkiRJU8k7KkqSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR1tM6lOcnqSm5NcNmvbbknOT7K5/b3rcMOUJEmSJtdiRqrfCxw+Z9spwAVVtR9wQftckiRJWpG2mVRX1WeB78/ZfBSwoX28ATh6wHFJkiRJU2O5c6r3qKobANrfD1nowCTrkmxMsnHLli3L/DhJkiRpcg39QsWqOrWq1lbV2lWrVg374yRJkqSRW25SfVOSPQHa3zcPLiRJkiRpuiw3qT4HOKF9fAJw9mDCkSRJkqbPYpbU+wDweeDRSa5NciKwHjgsyWbgsPa5JEmStCLtsK0DqurYBXYdOuBYFrTmlHNH9VGSJEnSknlHRUnSknljMEm6J5NqSdJyvBdvDCZJP2NSLUlaMm8MJkn3ZFItSRoUbwwmacUyqZYkjZw3BpPUNybVkqRB8cZgklYsk2pJ0qB4YzBJK5ZJtSRpybwxmCTd0zZv/iJJ0lyTcGMwSZokjlRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkddVpSL8lVwO3AncDWqlo7iKAkSZoka045917brlp/5BgikTSpBrFO9W9U1fcG8D6SJEnSVHL6hyRJktRR15HqAs5LUsA7q+rUuQckWQesA1i9enXHjxu++b7iW8gkfPXnV5KSJEnj13Wk+olV9avAEcALkzxp7gFVdWpVra2qtatWrer4cZIkSdLk6ZRUV9X17e+bgY8CBw0iKEmSJGmaLDupTnL/JA+YeQw8FbhsUIFJkiRJ06LLnOo9gI8mmXmfv6+qTwwkKkmSJGmKLDuprqorgccNMJZeWOhCRy8elCTNNm0Xxkv6+VxST5IkSerIpFqSJEnqyKRakiRJ6mgQtymXJElj4A3ApMlhUr3C2SBLkiR15/QPSZIkqSOTakmSJKkjp39IkjRA455W5/0SpPFwpFqSJEnqyJHqDpZyN6yur1/KCIOjFJpW4x7hkyRpuRypliRJkjpypFqSpGVYyreNXb/ZlDT5HKmWJEmSOjKpliRJkjpy+scKMowLKwdxEVnXuIZ1IdskXPA5rItZl/JZ873vSvu3kSRpWzol1UkOB94KbA+8u6rWDyQqSdLUsm8YvFHPyV7KIMq4V+2ZhD/yl2KU8fb536brscP4N1j29I8k2wN/AxwB7A8cm2T/QQUmSZo+9g2SVqouc6oPAr5VVVdW1U+AM4GjBhOWJGlK2TdIWpG6JNV7AdfMen5tu02StHLZN0hakVJVy3thcgzwtKr6vfb5ccBBVfXiOcetA9a1Tx8NfGMJH7M78L1lBTg9LGM/WMbpN7d8+1bVqnEFM61G1DdAP+pjH8oA/SiHZZgck16OBfuGLhcqXgvsM+v53sD1cw+qqlOBU5fzAUk2VtXa5YU3HSxjP1jG6df38o3Q0PsG6Mf/Vx/KAP0oh2WYHNNcji7TP74E7Jfk4Ul2Ap4HnDOYsCRJU8q+QdKKtOyR6qramuRFwCdplk06vaouH1hkkqSpY98gaaXqtE51VX0c+PiAYpnPsr8anCKWsR8s4/Tre/lGZgR9A/Tj/6sPZYB+lMMyTI6pLceyL1SUJEmS1Ogyp1qSJEkSJtWSJElSZ53mVA9aksfQ3HlrL6BolmE6p6o2jTUwSdLY2DdImgYTM6c6ycuAY2luaXttu3lvmuWYzqyq9eOKTUuTZBfgcO7ZAX6yqm4da2ADlCQ0t2OeXcYv1qScUB31vXywMuppH9g3TJa+nDd9aOP6UAboT52CyUqqvwn8UlX9dM72nYDLq2q/8UQ2WH2qPPNJcjzwKuA84Lp2897AYcBrqup944ptUJI8FfhbYDP3LOMjgd+vqvPGFdsg9L18sDLqaV/0qW+Y9va/L+dNH9q4PpQB+lOnZkxSUv11mlvbXj1n+77AeVX16PFENjh9qzzzSfIN4AlzO4kkuwIXVdWjxhPZ4CTZBBxRVVfN2f5w4ONV9dixBDYgfS8frIx62hd96Rv60P735bzpQxvXhzJAf+rUjEmaU30ycEGSzcA17bbVNH91vWhsUQ3WK4HHL1R5gIlvVBchNCMwc93V7uuDHbj7a+jZrgN2HHEsw9D38sHKqKd90Ze+oQ/tf1/Omz60cX0oA/SnTgETlFRX1SeSPIq75weFpsJ8qaruHGtwg9OryrOA1wGXJDmPe3aAhwGvHVtUg3U68KUkZ3J3GfehmeN52tiiGpy+lw9WRj3thR71DX1o//ty3vShjetDGaA/dQqYoOkfK0GSE4D/S/P1370qT1W9d0yhDVQ78vI07tkBfrKqbhlrYAOUZH/gmdyzjOdU1RVjDWxA+l4+WBn1VJOjL+1/X86bPrRxfSgD9KdOgUn1yPWp8vw8SfZg1sU4VXXTmEMaiiS7AdW3/78ZK6B8K6KeajL0pf3v03nThzauJ2XoRZ0yqR6DvlSe+SQ5AHgHsAtNhxGai3Fupbki+ZIxhjcQSVYDfwUcAvxnu3kX4FPAKXMvHJk2fS8frIx6qsk0ze1/X86bPrRxfSgD9KdOzTCpHqG+VZ75JLkUeEFVXTRn+8HAO6vqceOJbHCSfB54C3DWzJzOJNsDxwAnV9XB44yvq76XD1ZGPdVk6UP735fzpg9tXB/KAP2pUzNMqkeob5VnPkk2L7RubJJvVdUjRx3ToG2jjAvumxZ9Lx+sjHqqydKH9r8v500f2rg+lAH6U6dmTMzqHyvE/ec2qABV9YUk9x9HQEPwz0nOpVkeavYVyccDnxhbVIN1cZK/BTZwzzKeAHx5bFENTt/LByujnmqy9KH978t504c2rg9lgP7UKcCR6pFK8tfAI5i/8nynqqZpzdUFJTkCOIp7X5H88bEGNiDtndxOZJ4yAqdV1Y/HGF5nfS/fjL7XU02WvrT/fThv+tDG9aEMM/pQp2aYVI9YnyqPJGnxbP+lfjOp1kAl2QV4OU3H8ZB2883A2cD6uXcTm0ZJdqAZITiaWVfx05TxtKr66RjD66zv5YOVUU+lQevLedOHNq4PZYD+1KkZJtUj1LfKM58kn6RZ0mdDVd3Ybnso8NvAoVV12BjDG4gkH6C5Yn8Dd98mdm+auWy7VdVzxxXbIPS9fLAy6qkmSx/a/76cN31o4/pQBuhPnZphUj1Cfas880nyjap69FL3TZNtlPGbVfWoUcc0SH0vH6yMeqrJ0of2vy/nTR/auD6UAfpTp2ZsN+4AVpg1VfX6mQYVoKpurKr1NLer7YOrk7y0vcEB0NzsIMnLuPvinGl3S5Jjkvzs/EmyXZLnAlN7R6tZ+l4+WBn1VJOlD+1/X86bPrRxfSgD9KdOASbVo9aryrOA5wIPBj6T5JYk3wcuBHYDnjPOwAboecCzgRuTfDPJN4Ebgd9s9027mfLd1JZvM/0qH6yMeqrJ0of2vy/nTR/a8L60032pU4DTP0Yqya7AKdxzTt1NNEvgrK+qafrrckFJHkMzt+sLVXXHrO2HV9XUrTs5nyRPoLkw5NvAY4GDgSv6dhV/kgfTrFLwlqp6/rjjGZYkvw4cBHytqs4bdzzqn760/31p3/vUhvepnZ72ttikekIk+Z2qes+44+gqyR8ALwQ2AQcAJ1XV2e2+S6rqV8cZ3yAkeRVwBM3Nk86naQA+AzwF+GRVvW6M4XWW5Jx5Nh9CMx+UqnrmaCMavCRfrKqD2se/R1Nn/xF4KvBP7Vfy0khMS/vfl/a9D214X9rpvrXFJtUTIsl3q2pa5tUtKMnXgF+rqjuSrAHOAt5fVW9N8uWqOnCsAQ5AW8YDgF+g+bpt76q6Lcl9gYuq6lfGGmBHSS4BrgDeTTOSE+ADtF8pVtVnxhfdYMyui0m+BDy9qrakubPdF6rqv403Qq0k09L+96V970Mb3pd2um9tsbcpH6EkX11oF7DHAvumzfYzXwlW1VVJngyclWRfmnL2wdaquhP4YZJvV9VtAFX1oyR3jTm2QVgLnAS8EviTqro0yY+mpZFepO3ar+O3oxlc2AJQVT9IsnW8oamPetL+96V970Mb3pd2uldtsUn1aO0BPI17X5kb4N9HH85Q3JjkgKq6FKAd0XgGcDowVX9x/hw/SXK/qvoh8PiZjWnWoZ2WBnlBVXUX8OYk/9D+von+tRW7ABfTnHuV5KFVdWOSnZmu5EDTow/tf1/a96lvw3vUTveqLZ7G/4Bp9jFg55kGabYkF44+nKE4HrjHX5dVtRU4Psk7xxPSwD2pqn4MP2vYZuxIs/B+L1TVtcAxSY4Ebht3PINUVWsW2HUX8KwRhqKVow/tf1/a99604dPeTvetLXZOtSRJktSR61RLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkiRJHZlUS5IkSR2ZVEuSJEkdmVRLkvT/t3fvQZKV9RnHvw9sNCIgqBMUl3WxICTGKJLxEq1cFIiLWGJSMUJFXdS4VhmFmIuuMRUrlUpqczOaKNEtuSVSYIQYiHiBqGiZRHQXiRFXxELuCksE4jW68ssffTYMw15m5+2d02f6+6ma2u73nO7zTDNUPXPmPe+RpEaWavUuyYeSrO0en5rkU3vxWKuTVJIV84+9m9dVkiP2Vi5JkjRsK/oOIFXVCdN4bEmStHx4plqSJElqZKnW2CS5IcnvJvl8knuSvDfJjyY5OMkHkmxNclf3eOWc112R5Df28FiV5LQk1ye5M8lfJNmn27ZPkj9IcmOSO5L8fZKH7eR9/v/YSY5I8oku+51J3jtv9+OSXNd9D+9Ikj38iCRJ0jJlqda4/RqwBjgceCJwKqOfs7OBxwKrgO8Cbx/DsX4ZmAWOAU4CXt6Nn9p9PQt4HLD/Ao/3x8BlwMHASuBv521/HvAU4EmMvs/ntISXJEnLh6Va4/Y3VXVbVX0D+Bfg6Kr676q6qKq+U1XfBP4E+IUxHOvPquobVXUT8FbglG7814G3VNX1VfUt4I3AydsvTtyFHzAq/odW1feqav4Fkxuq6u7ueB8Hjh7D9yBJkpYBS7XG7etzHn8H2D/Jfkne1U3H+B/gk8BBSfZtPNbNcx7fCBzaPT60ez532wrgkN283+uBAJ9Jck2Sl8/b/oDvbY8TS5KkZclSraXwO8BRwNOq6kDg57vx1jnJh815vAq4rXt8G6MzznO3bQNu39WbVdXXq+qVVXUo8CrgDJfRkyRJC2Gp1lI4gNE86ruTPBx485je9/e6iyAPA04Htl9YeD7wuiSHJ9kf+FPgvVW1bVdvluSFcy6gvAso4IdjyipJkpYxS7WWwluBhwB3Ap8GPjym970Y2AxcDVwKnNmNnwX8A6NpJl8Fvge8dgHv9xTgyiTfAi4BTq+qr44pqyRJWsZSVX1nkPZYkgKOrKqv9J1FkiTJM9WSJElSI29TromU5OeAD+1oW1W56oYkSZooTv+QJEmSGjn9Q5IkSWpkqZYkSZIaLemc6kc+8pG1evXqpTykJC3K5s2b76yqmb5zSJKGYUlL9erVq9m0adNSHlKSFiXJjbvfS5KkEad/SJIkSY0s1ZIkSVIjS7UkSZLUyFItSZIkNbJUS5IkSY0GcZvy1esvfcDYDRtO7CGJJEmS9ECeqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWpkqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWq021Kd5KwkdyT5wpyxv0jypSSfT/L+JAft3ZiSJEnS5FrImepzgDXzxi4HnlBVTwS+DLxxzLkkSZKkwdhtqa6qTwLfmDd2WVVt655+Gli5F7JJkiRJgzCOOdUvBz60s41J1iXZlGTT1q1bx3A4SZIkabI0leokbwK2AeftbJ+q2lhVs1U1OzMz03I4SZIkaSKtWOwLk6wFngccW1U1vkiSJEnSsCyqVCdZA7wB+IWq+s54I0mSJEnDspAl9c4H/gM4KsktSV4BvB04ALg8ydVJ3rmXc0qSJEkTa7dnqqvqlB0Mn7kXskiSJEmD5B0VJUmSpEaWakmSJKmRpVqSJElqZKmWJEmSGlmqJUmSpEaWakmSJKmRpVqSJElqZKmWJEmSGlmqJUmSpEaWakmSJKmRpVqSJElqZKmWJEmSGlmqJUmSpEaWakmSJKmRpVqSJElqtNtSneSsJHck+cKcsYcnuTzJdd2/B+/dmJIkSdLkWsiZ6nOANfPG1gMfraojgY92zyVJkqSptNtSXVWfBL4xb/gk4Nzu8bnAC8acS5IkSRqMxc6pPqSqvgbQ/ftjO9sxybokm5Js2rp16yIPJ0mSJE2uvX6hYlVtrKrZqpqdmZnZ24eTJEmSltxiS/XtSR4N0P17x/giSZIkScOy2FJ9CbC2e7wWuHg8cSRJkqThWciSeucD/wEcleSWJK8ANgDHJ7kOOL57LkmSJE2lFbvboapO2cmmY8ecRZIkSRok76goSZIkNbJUS5IkSY0s1ZIkSVIjS7UkSZLUyFItSZIkNbJUS5IkSY0s1ZIkSVIjS7UkSZLUyFItSZIkNbJUS5IkSY0s1ZIkSVIjS7UkSZLUyFItSZIkNbJUS5IkSY2aSnWS1yW5JskXkpyf5EfHFUySJEkaikWX6iSPAU4DZqvqCcC+wMnjCiZJkiQNRev0jxXAQ5KsAPYDbmuPJEmSJA3Lokt1Vd0K/CVwE/A14J6qumxcwSRJkqShWLHYFyY5GDgJOBy4G3hfkhdX1Xvm7bcOWAewatWqhqharlavv/QBYzdsOLGHJJIkSYvTMv3jOOCrVbW1qn4A/BPwjPk7VdXGqpqtqtmZmZmGw0mSJEmTqaVU3wQ8Pcl+SQIcC2wZTyxJkiRpOFrmVF8JXAhcBfxX914bx5RLkiRJGoxFz6kGqKo3A28eUxZJkiRpkLyjoiRJktTIUi1JkiQ1slRLkiRJjSzVkiRJUiNLtSRJktTIUi1JkiQ1slRLkiRJjSzVkiRJUiNLtSRJktTIUi1JkiQ1slRLkiRJjSzVkiRJUiNLtSRJktTIUi1JkiQ1slRLkiRJjZpKdZKDklyY5EtJtiT52XEFkyRJkoZiRePr3wZ8uKp+NcmDgP3GkEmSJEkalEWX6iQHAj8PnApQVd8Hvj+eWJIkSdJwtEz/eBywFTg7yeeSvDvJQ+fvlGRdkk1JNm3durXhcJIkSdJkainVK4BjgL+rqicD3wbWz9+pqjZWzyP4gQAACzRJREFU1WxVzc7MzDQcTpIkSZpMLaX6FuCWqrqye34ho5ItSZIkTZVFl+qq+jpwc5KjuqFjgS+OJZUkSZI0IK2rf7wWOK9b+eN64GXtkSRJkqRhaSrVVXU1MDumLJIkSdIgeUdFSZIkqZGlWpIkSWpkqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWpkqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWpkqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWpkqZYkSZIaNZfqJPsm+VySD4wjkCRJkjQ04zhTfTqwZQzvI0mSJA1SU6lOshI4EXj3eOJIkiRJw7Oi8fVvBV4PHLCzHZKsA9YBrFq1qvFww7V6/aUPGLthw4k9JJF2bEc/o+DPqSRJC7HoM9VJngfcUVWbd7VfVW2sqtmqmp2ZmVns4SRJkqSJ1TL945nA85PcAFwAPDvJe8aSSpIkSRqQRZfqqnpjVa2sqtXAycDHqurFY0smSZIkDYTrVEuSJEmNWi9UBKCqrgCuGMd7SZIkSUPjmWpJkiSpkaVakiRJamSpliRJkhpZqiVJkqRGlmpJkiSpkaVakiRJamSpliRJkhpZqiVJkqRGlmpJkiSpkaVakiRJamSpliRJkhpZqiVJkqRGlmpJkiSpkaVakiRJarToUp3ksCQfT7IlyTVJTh9nMEmSJGkoVjS8dhvwO1V1VZIDgM1JLq+qL44pmyRJkjQIiz5TXVVfq6qrusffBLYAjxlXMEmSJGkoxjKnOslq4MnAleN4P0mSJGlIWqZ/AJBkf+Ai4Leq6n92sH0dsA5g1apVrYebKKvXX/qAsRs2nDix77s37Cirdm1P/vsu5c+YJElavKYz1Ul+hFGhPq+q/mlH+1TVxqqararZmZmZlsNJkiRJE6ll9Y8AZwJbquot44skSZIkDUvLmepnAi8Bnp3k6u7ruWPKJUmSJA3GoudUV9WngIwxiyRJkjRI3lFRkiRJamSpliRJkhpZqiVJkqRGlmpJkiSpkaVakiRJamSpliRJkhpZqiVJkqRGi16num+r11+6w/EbNpy4oH13tN+e7rsnufaW1uNN8vfWain/Wy705265a/3MW4+1M3srgyRJ23mmWpIkSWpkqZYkSZIaWaolSZKkRpZqSZIkqZGlWpIkSWpkqZYkSZIaWaolSZKkRk2lOsmaJNcm+UqS9eMKJUmSJA3Jokt1kn2BdwAnAI8HTkny+HEFkyRJkoai5Uz1U4GvVNX1VfV94ALgpPHEkiRJkoajpVQ/Brh5zvNbujFJkiRpqqSqFvfC5IXAc6rqN7rnLwGeWlWvnbffOmBd9/Qo4NrFx91rHgnc2XeIBkPOb/Z+DDk7LE3+x1bVzF4+hiRpmVjR8NpbgMPmPF8J3DZ/p6raCGxsOM5el2RTVc32nWOxhpzf7P0YcnYYfn5J0vLTMv3js8CRSQ5P8iDgZOCS8cSSJEmShmPRZ6qraluS1wAfAfYFzqqqa8aWTJIkSRqIlukfVNUHgQ+OKUufJnp6ygIMOb/Z+zHk7DD8/JKkZWbRFypKkiRJGvE25ZIkSVIjS7UkSZLUyFItSZIkNWq6UHHIkhzC6A6QBdxWVbf3HGmPJHk4UFV1V99Z9pSfvSRJWm6m7kLFJEcD7wQeBtzaDa8E7gZeXVVX9ZVtd5KsAv4cOJZR3gAHAh8D1lfVDf2l2z0/+/4keRiwhjm/zAAfqaq7ew22QEl+AjiJ++e/pKq29BpMkqTONE7/OAc4vap+sqqO675+Avgt4Ox+o+3We4H3A4+qqiOr6gjg0cA/Axf0mmxhzsHPfskleSlwFfCLwH7AQ4FnAZu7bRMtyRsYfcYBPsPoxlMBzk+yvs9skiRtN41nqq+rqiN3su0rXVmaSLvJvtNtk8LPvh9JrgWeNv+sdJKDgSur6sf7SbYwSb4M/FRV/WDe+IOAayb5s5ckTY9pnFP9oSSXAn8P3NyNHQa8FPhwb6kWZnOSM4BzuX/2tcDneku1cH72/QijKRPz3dttm3T3AocCN84bf3S3TZKk3k3dmWqAJCdw3/zMALcwmp850XeH7M7MvYIdZAfOrKr/7THegvjZL70ka4E/BC7jvl8IVgHHA39cVef0FG1BkqwB3g5cx/3zHwG8pqom/RcySdIUmMpSLU2bbqrHc7j/LwQfGcoKJkn2AZ7K/fN/tqp+2GswSZI6Uzf9o1sF4Y2Mzjj+WDd8B3AxsGGSV0NIsoLR2dIXcP9VEC5mdLb0B7t4ee/87PtTVXcl+Tj3X8pwEIW6U3O+7p3zryRJE2HqzlQn+QijZdDOraqvd2OPAk4Fjq2q43uMt0tJzme0nNu5jM7UwWhJurXAw6vqRX1lWwg/+37MW8rwFkZnegexlCFAkl8CzmA0/WPuUoxHMMp/WV/ZJEnabhpL9bVVddSebpsEu8n+5QGs4uBn34MkVwOvqqor540/HXhXVT2pn2QLk2QLcML8tcCTHA58sKp+spdgkiTNMY3rVN+Y5PXdXf2A0R3+urVwb97F6ybBXUle2M0vBUZzTZO8CBjCn/L97Pvx0PmFGqCqPs1ozepJt4L7/jow163AjyxxFkmSdmjq5lQDLwLWA5/oyl0BtzNaxeHX+gy2ACcDfwackeQuRn/Gfxjw8W7bpFsOn/07kmyf+30Qw/jsh7yUIcBZwGeTXMD9858MnNlbKkmS5pi66R/w/7c8Xgl8uqq+NWd8zVCW50ryCEal+q1V9eK+8yxEkqcBX6qqe5Lsx6hgHwNcA/xpVd3Ta8Bd6JbUO4XRxYlXAScAz2CUfeOkX6g41KUMt0vyeOD5PDD/F3sNJklSZ+pKdZLTgN8EtgBHM7pt9sXdtquq6pg+8+1Kkkt2MPxsRhf/UVXPX9pEeybJNcCTqmpbko3At4GLgGO78V/pNeAuJDmP0V92HgLcw2jaxPsZZU9Vre0xniRJ6tk0Tv94JfAzVfWtJKuBC5Osrqq3Mfl3l1sJfBF4N6OpEwGeAvxVn6H2wD5Vta17PDvnF5hPdRfTTbKfrqondkvr3QocWlU/TPIe4D97zrZLQ17KECDJgYzyr2R0YeL5c7adUVWv7i2cJEmdabxQcd/tUz661QR+ETghyVuY/FI9C2wG3gTcU1VXAN+tqk9U1Sd6TbYwX0jysu7xfyaZBUjy48BET58A9ummgBwA7MdoLjvAg5n8i+X+kdHFlM+qqkdU1SOAZzFaUu99vSZbmLMZ/b95EXBKkouSPLjb9vT+YkmSdJ9pnP7xMeC3q+rqOWMrGF0M9etVtW9v4RYoyUrgrxld5Pf8qlrVc6QF6c6Yvg34OeBORvOpb+6+TquqiT3jm+R1wGuBfRn9ZeAk4HpGpe7CqvqjHuPt0pCXMoTRkoBVdfSc528CnstojvXlkzxlS5I0PaaxVK8Etm2/+ci8bc+sqn/rIdaiJDkReGZV/X7fWfZEkgOAx9EtlVZVt/ccaUGSHApQVbclOQg4Dripqj7Tb7JdS3IZ8K+Mbrpzezd2CKOb7hxfVcf1GG+3unWqf6qq7p0zthZ4PbB/VT22t3CSJHWmrlRL0ybJwYxWWpk7p3r7UoYbJv125Un+HLisqv513vga4G+r6sh+kkmSdB9LtTTFkrysqs7uO8diDT2/JGn5sFRLUyzJTUOZk78jQ88vSVo+pnFJPWmqJPn8zjYBh+xk28QYen5J0nSwVEvL3yHAcxgtqzdXgH9f+jh7bOj5JUlTwFItLX8fYLRKxgNusJPkiqWPs8eGnl+SNAWcUy1JkiQ1msY7KkqSJEljZamWJEmSGlmqJUmSpEaWakmSJKmRpVqSJElq9H+qFBdd65qmpQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 864x864 with 6 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "df.hist(column='length', by='product_type', bins=50,figsize=(12,12));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The largest description has almost 200 characters, some of them are very short."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text Pre-processing\n",
    "\n",
    "In this step we have to remove punctuation, stop words, bad characters and so on. The second step is aimed to convertion all of the words to lower case and then stemmed them using the Porter Stemmer in the NLTK package.\n",
    "\n",
    "**Stemming** operation bundles together words of same root. E.g. stem operation bundles “fishing”,  and “fished” into a common \"fish\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function to clean text, remove stop words and apply stemming operation for each line of text:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['i', \"you've\", 'himself', 'they', 'that', 'been', 'a', 'while', 'through', 'in', 'here', 'few', 'own', 'just', 're', 'doesn', 'ma', \"shouldn't\"]\n"
     ]
    }
   ],
   "source": [
    "stop = stopwords.words('english')\n",
    "print(stop[::10])\n",
    "\n",
    "porter = PorterStemmer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_data(text):\n",
    "    ''' The function to remove punctuation,\n",
    "    stopwords and apply stemming'''\n",
    "    \n",
    "    words = re.sub(\"[^a-zA-Z]\", \" \", text)\n",
    "    words = [word.lower() for word in text.split() if word.lower() not in stop]\n",
    "    words = [porter.stem(word) for word in words]\n",
    "    return \" \".join(words)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Apply the function to each examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['description'] = df['description'].apply(preprocess_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'lippi stix formula contain vitamin e, mango, avocado, shea butter ad comfort moisture. none lippi formula contain nasti ingredi like paraben sulfates.'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['description'][2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Pos Tagging\n",
    "\n",
    "To get deeper insight into the way in which that word is used in speech, the Part of speech (POS) tagging method has been used. There are eight primary parts of speech and each of them have a corresponding tag. The NLTK libary has a method to perform POS tagging. \n",
    "\n",
    "The example of Pos Tagging on analyzed dataset is presented below: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('lippi', 'JJ'),\n",
       " ('pencil', 'NN'),\n",
       " ('long-wear', 'JJ'),\n",
       " ('high-intens', 'NNS'),\n",
       " ('lip', 'NN'),\n",
       " ('pencil', 'NN'),\n",
       " ('glide', 'NN'),\n",
       " ('easili', 'FW'),\n",
       " ('prevent', 'NN'),\n",
       " ('feathering', 'NN'),\n",
       " ('.', '.'),\n",
       " ('mani', 'NN'),\n",
       " ('lippi', 'JJ'),\n",
       " ('stix', 'NN'),\n",
       " ('coordin', 'NN'),\n",
       " ('lippi', 'NN'),\n",
       " ('pencil', 'NN'),\n",
       " ('design', 'NN'),\n",
       " ('compliment', 'NN'),\n",
       " ('perfectly', 'RB'),\n",
       " (',', ','),\n",
       " ('feel', 'VB'),\n",
       " ('free', 'JJ'),\n",
       " ('mix', 'NN'),\n",
       " ('match', 'NN'),\n",
       " ('!', '.')]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens = df['description'][0]\n",
    "\n",
    "nltk.pos_tag(word_tokenize(tokens))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bag of words\n",
    "\n",
    "To run machine learning algorithms we need to convert the text files into numerical feature vectors. We will use bag of words model for our analysis. In general we segment each text file into words  and count of times each word occurs in each document and finally assign each word an integer id. Each unique word in our dictionary will correspond to a feature (descriptive feature).\n",
    "\n",
    "More precisely we will convert our text documents to a matrix of token counts (CountVectorizer), then transform a count matrix to a normalized TF-IDF representation (tf-idf transformer). \n",
    "\n",
    "\n",
    "#### CountVectorizer "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "vectorizer = CountVectorizer()\n",
    "vectorizer.fit(df['description'])\n",
    "vector = vectorizer.transform(df['description'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(906, 5519)\n",
      "[[0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " ...\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]]\n"
     ]
    }
   ],
   "source": [
    "print(vector.shape)\n",
    "print(vector.toarray())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### TF-IDF\n",
    "Term Frequency–Inverse Document Frequency\n",
    "\n",
    "Extract the **tfidf** representation matrix of the text data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       ...,\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.],\n",
       "       [0., 0., 0., ..., 0., 0., 0.]])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tfidf_converter = TfidfTransformer()\n",
    "X_tfidf = tfidf_converter.fit_transform(vector).toarray()\n",
    "X_tfidf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data preparation\n",
    "\n",
    "Now we split dataset into train and test dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df['description']\n",
    "y = df['product_type']\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((679,), (227,), (679,), (227,))"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build model\n",
    "\n",
    "We use different machine learning algorithms to get more accurate predictions and choose the most accurate one for our issue. The used classification models are: Logistic Regression, Multinomial Naive Bayes, Linear Support Vector Machine (LinearSVM), Random Forest and Gradient Boosting as well.\n",
    "\n",
    "#### Pipeline\n",
    "\n",
    "We build a pipeline. We can write less code and do all of the above, by building a pipeline as follows.\n",
    "\n",
    "To make the vectorizer => transformer => classifier easier to work with, we will use Pipeline class in Scilkit-Learn.\n",
    "\n",
    "**Models**\n",
    "\n",
    "Firstly we calculate each model separately. To evaluate our models we used accuracy score and classification report(precision, recall, F1-score). \n",
    "\n",
    "Accuracy is one of the most common classification evaluation metric, i.e. the number of correct predictions made as a ratio of total predictions. \n",
    "\n",
    "#### 1. Logistic Regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_log = Pipeline([('vect', CountVectorizer(min_df=5, ngram_range=(1,2))),\n",
    "                      ('tfidf', TfidfTransformer()),\n",
    "                      ('model',LogisticRegression()),\n",
    "                     ])\n",
    "\n",
    "model_log.fit(X_train, y_train)\n",
    "\n",
    "ytest = np.array(y_test)\n",
    "pred = model_log.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 0.9295154185022027\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "     contour       0.90      0.93      0.91        28\n",
      "  eye_makeup       0.91      1.00      0.95       106\n",
      "  foundation       1.00      0.86      0.92        43\n",
      "    lipstick       0.94      0.82      0.88        39\n",
      " nail_polish       1.00      0.91      0.95        11\n",
      "\n",
      "    accuracy                           0.93       227\n",
      "   macro avg       0.95      0.90      0.92       227\n",
      "weighted avg       0.93      0.93      0.93       227\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print('accuracy %s' % accuracy_score(pred, y_test))\n",
    "print(classification_report(ytest, pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. LinearSVC**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "svc = Pipeline([('vect', CountVectorizer(min_df=5, ngram_range=(1,2))),\n",
    "               ('tfidf', TfidfTransformer()),\n",
    "               ('model',LinearSVC()),\n",
    "               ])\n",
    "\n",
    "svc.fit(X_train, y_train)\n",
    "\n",
    "ytest = np.array(y_test)\n",
    "y_pred = svc.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 0.960352422907489\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "     contour       0.90      0.96      0.93        28\n",
      "  eye_makeup       0.95      1.00      0.98       106\n",
      "  foundation       1.00      0.91      0.95        43\n",
      "    lipstick       0.97      0.90      0.93        39\n",
      " nail_polish       1.00      1.00      1.00        11\n",
      "\n",
      "    accuracy                           0.96       227\n",
      "   macro avg       0.97      0.95      0.96       227\n",
      "weighted avg       0.96      0.96      0.96       227\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print('accuracy %s' % accuracy_score(y_pred, y_test))\n",
    "print(classification_report(ytest, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Naive Bayes Classifier "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "nbc = Pipeline([('vect', CountVectorizer(min_df=5, ngram_range=(1,2))),\n",
    "               ('tfidf', TfidfTransformer()),\n",
    "               ('model',MultinomialNB()),\n",
    "               ])\n",
    "\n",
    "nbc.fit(X_train, y_train)\n",
    "\n",
    "ytest = np.array(y_test)\n",
    "pred_y = nbc.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 0.9074889867841409\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "     contour       0.87      0.93      0.90        28\n",
      "  eye_makeup       0.87      0.99      0.93       106\n",
      "  foundation       1.00      0.86      0.92        43\n",
      "    lipstick       0.97      0.79      0.87        39\n",
      " nail_polish       1.00      0.64      0.78        11\n",
      "\n",
      "    accuracy                           0.91       227\n",
      "   macro avg       0.94      0.84      0.88       227\n",
      "weighted avg       0.92      0.91      0.91       227\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print('accuracy %s' % accuracy_score(pred_y, y_test))\n",
    "print(classification_report(ytest, pred_y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4. Random Forest**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "rf = Pipeline([('vect', CountVectorizer(min_df=5, ngram_range=(1,2))),\n",
    "               ('tfidf', TfidfTransformer()),\n",
    "               ('rf', RandomForestClassifier(n_estimators=50)),\n",
    "               ])\n",
    "\n",
    "rf.fit(X_train, y_train)\n",
    "\n",
    "ytest = np.array(y_test)\n",
    "preds = rf.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 0.920704845814978\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "     contour       0.87      0.93      0.90        28\n",
      "  eye_makeup       0.90      0.99      0.94       106\n",
      "  foundation       0.97      0.84      0.90        43\n",
      "    lipstick       0.97      0.79      0.87        39\n",
      " nail_polish       1.00      1.00      1.00        11\n",
      "\n",
      "    accuracy                           0.92       227\n",
      "   macro avg       0.94      0.91      0.92       227\n",
      "weighted avg       0.93      0.92      0.92       227\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print('accuracy %s' % accuracy_score(preds, y_test))\n",
    "print(classification_report(ytest, preds))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**5. Gradient Boosting**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_gb = Pipeline([('vect', CountVectorizer(min_df=5, ngram_range=(1,2))),\n",
    "                    ('tfidf', TfidfTransformer()),\n",
    "                    ('gb', GradientBoostingClassifier(n_estimators=50)),\n",
    "                    ])\n",
    "\n",
    "model_gb.fit(X_train, y_train)\n",
    "\n",
    "ytest = np.array(y_test)\n",
    "predicted = model_gb.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 0.920704845814978\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "     contour       0.84      0.93      0.88        28\n",
      "  eye_makeup       0.91      0.99      0.95       106\n",
      "  foundation       0.95      0.84      0.89        43\n",
      "    lipstick       1.00      0.79      0.89        39\n",
      " nail_polish       1.00      1.00      1.00        11\n",
      "\n",
      "    accuracy                           0.92       227\n",
      "   macro avg       0.94      0.91      0.92       227\n",
      "weighted avg       0.93      0.92      0.92       227\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print('accuracy %s' % accuracy_score(predicted, y_test))\n",
    "print(classification_report(ytest, predicted))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can write a function to build and calculates our all models by using a pipeline as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_models(X_train, X_test, y_train, y_test):\n",
    "    models = pd.DataFrame()\n",
    "    classifiers = [\n",
    "        LogisticRegression(),\n",
    "        LinearSVC(),\n",
    "        MultinomialNB(),\n",
    "        RandomForestClassifier(n_estimators=50),\n",
    "        GradientBoostingClassifier(n_estimators=50), ]\n",
    "\n",
    "    for classifier in classifiers:\n",
    "        pipeline = Pipeline(steps=[('vect', CountVectorizer(\n",
    "                               min_df=5, ngram_range=(1, 2))),\n",
    "                                    ('tfidf', TfidfTransformer()),\n",
    "                                    ('classifier', classifier)])\n",
    "        pipeline.fit(X_train, y_train)\n",
    "        score = pipeline.score(X_test, y_test)\n",
    "        param_dict = {\n",
    "                      'Model': classifier.__class__.__name__,\n",
    "                      'Score': score\n",
    "                     }\n",
    "        models = models.append(pd.DataFrame(param_dict, index=[0]))\n",
    "\n",
    "    models.reset_index(drop=True, inplace=True)\n",
    "    print(models.sort_values(by='Score', ascending=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                        Model     Score\n",
      "1                   LinearSVC  0.960352\n",
      "0          LogisticRegression  0.929515\n",
      "3      RandomForestClassifier  0.920705\n",
      "4  GradientBoostingClassifier  0.920705\n",
      "2               MultinomialNB  0.907489\n"
     ]
    }
   ],
   "source": [
    "get_models(X_train, X_test, y_train, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Best model**\n",
    "\n",
    "We have tested several different models and now, we check which one is the best:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "log_acc = accuracy_score(pred, y_test)\n",
    "svm_acc = accuracy_score(y_pred, y_test)\n",
    "nb_acc = accuracy_score(pred_y, y_test)\n",
    "rf_acc = accuracy_score(preds, y_test)\n",
    "gb_acc = accuracy_score(predicted, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Model</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>SVC</td>\n",
       "      <td>0.960352</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Logistic Regression</td>\n",
       "      <td>0.929515</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Random Forest</td>\n",
       "      <td>0.920705</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Gradient Boosting</td>\n",
       "      <td>0.920705</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Naive Bayes</td>\n",
       "      <td>0.907489</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Model     Score\n",
       "1                  SVC  0.960352\n",
       "0  Logistic Regression  0.929515\n",
       "3        Random Forest  0.920705\n",
       "4    Gradient Boosting  0.920705\n",
       "2          Naive Bayes  0.907489"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "models = pd.DataFrame({\n",
    "                      'Model': ['Logistic Regression', 'SVC', 'Naive Bayes', 'Random Forest', 'Gradient Boosting'],\n",
    "                      'Score': [log_acc, svm_acc, nb_acc, rf_acc, gb_acc]})\n",
    "models.sort_values(by='Score', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From above analysis we see that the best model is SVC with accuracy score of 96%. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predictions\n",
    "\n",
    "We can see some predictions from our model on test example text (makeup description).\n",
    "\n",
    "First we load our trained model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('models\\SVC_model.pkl', 'rb') as f:\n",
    "    model = load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example text (description of the real makeup product - mascara):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['The immediate effect of lengthened and thickened eyelashes without splashing or smudging, perfectly resists high temperatures and humidity, so you can spend the day actively. Safe for sensitive eyes and people wearing contact lenses.']\n"
     ]
    }
   ],
   "source": [
    "text = [\"The immediate effect of lengthened and thickened eyelashes without splashing or smudging, perfectly resists high temperatures and humidity, so you can spend the day actively. Safe for sensitive eyes and people wearing contact lenses.\"]\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check the result and predicted class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['eye_makeup']\n"
     ]
    }
   ],
   "source": [
    "prediction = model.predict(text)\n",
    "\n",
    "print(prediction)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can see that our model works well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This project was aimed to multi-class text classification of products based on their description.  Based on given text as an input, we have predicted what would be the category. For analysis we used python and their scikit-learn and NLTK libraries. We started with the data engineering and text pre-processing, which cover the remove punctuation, stop words and stemming operation as well.  Next we used bag of words model to convert the text files into numerical feature vectors. Following we have used five different classification models such as Logistic Regression, Multinomial Naive Bayes, Linear Support Vector Machine (LinearSVM), Random Forest and Gradient Boostingto achaived the best model. We used an accuracy score and after checked this metric the best algorithm that we got are Linear SVM  with accuracy score of 96% and it was not much bigger than other models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
